open-tamil வரிசை எண் 1.1 வெளியீடு

இன்று open-tamil வரிசை எண் 1.1 வெளியீடு ஆகிறது. இதனை கீழ் உள்ள கட்டளையின் வழியாக பெறலாம்:

$ python3 -m pip install open-tamil --upgrade

இந்த வெளியீடில் உள்ள புதியது தமிழில் நாள் திகதி விவரங்களை பெறக்கூடிய செயற்பாடுகளாவன. இதனை பங்களித்த அருண்மொழி (@techolic) அவருக்கு நன்றி.

  1. date module: new update to this module in the v1.1 release was added by Arunmozhi (Techolic) adds datetime class with strftime, tamil_weekday(), Example usage:
>>> from datetime.datetime import now
>>> from tamil.date import datetime 
>>  n = now()
>>> d = datetime(n.year,n.month,n.day,n.hour,n.minute) 
>>> d.strftime_ta("%a %d, %b %Y") 
'வியாழன் 26, மே 2022'

முழு அறிக்கையை இங்கு பெறலாம் – https://pypi.org/project/Open-Tamil/1.1/

நன்றி

முத்து

கலிபோர்னியா, அமெரிக்கா.

உளியருவி – Tamil tools for AI/ML

Motivation

In 2022 we are reaching a point where more Tamil datasets are available than Tamil tools – arunthamizh அருந்தமிழ். However the accessibility of fully-trained models and capability of providing pre-trained models are much harder and still require domain expertise in hardware and software. Personally I have published some small Jupyter notebooks (see here), and some simple articles, but they still remain inadequate to scale the breadth of Tamil computing needs in AI world among:

  1. NLP – Text Classification, Recommendation, Spell Checking, Correction tasks
  2. TTS – speech synthesis tasks
  3. ASR – speech recognition

While sufficient data exist for 1, the private corpora for speech tasks (அருந்தமிழ் பட்டியல்), the public corpora of a 300hr voice dataset recently published from Mozilla Common Voice (University of Toronto, Scarborough, Canada leading Tamil effort here) have enabled data completion to a large degree for tasks 2 and 3.

Ultimately the tooling provides capability to quickly compose AI services based on open-source tools and existing compute environment to host services and devices in Tamil space.

Proposal

My proposal is the following:

  1. Develop a open-source toolbox for pre-training and task training specialization
  2. Identify good components to base effort
  3. Contribute engineering effort, testing, and validation
    1. R&D – DataScience, Infra, AI framework
    2. Engineering Validation – DataScience, Tamil language expertise
    3. Engineering – packaging, documentation, distribution
    4. Project management
  4. Library to be liberally licensed MIT/BSD
  5. Open-Source license for developed models
  6. Find hardware resources for AI model pre-training etc.
  7. Managed by a steering committee / nominated BDFL
  8. Scope – decade time frame
  9. TBD – மேலும் பல.

Summary

Let’s build a pytorch-lightning like API for Tamil tasks across NLP, TTS, ASR via AI.

Leave your thoughts by email ezhillang -at- gmail -dot- com, or in comments section.