May 2, 2024

SMC Monthly Newsletter: April 2024

SMC Monthly Newsletter: April 2024

Launch of is the online 4-language Dravidian dictionary (Malayalam, Tamil, Telugu and Kannada). Its content  was compiled by the veteran Njattyela Sreedharan (ഞാറ്റ്യേല ശ്രീധരൻ) in a span of 25 years.

SMC Members meeting Njattyela Sreedharan on Aug 11, 2022

SMC is proud to have technically contributed to  its digitization efforts. Subin Siby has documented the process in detail in his blog post.

The release event was held at Bangalore International Convention Centre, co-hosted by Indic Digital Archive Foundation and the Karnataka Chapter of Malayalam Mission. During the event, the integration of E. K. Kurup's English-Malayalam Thesaurus to Olam  was announced.

From the event of samam release

Many active members of SMC including Subin Siby, Akshay S Dinesh, Joice Joseph, Santhosh Thottingal, Hiran Venugopalan and Manoj Karingamadathil participated in the event.


Malayalam Text Corpus

SMC's Malayalam corpus is getting expanded. Text corpus from Sayahna books, Deshabhimani, Sarvavijnanakosam were recently added. All these new content sources are creative commons licensed. Content is semi-automatically cleaned for common mistakes and normalized.

Mozilla Foundation's latest release of common voice 17.0 is now available on Huggingface hub. It has a total of 11 hours of Malayalam speech  by 134 speakers. Of these 4 hours of speech is validated manually for correctness.

Metapost Sandbox

The metapost sandbox by Santhosh Thottingal allows you to experiment with designing letters using metapost.

It latest release has many new features. You can now login using github, save the fork, fork somebody's work and list all your works as in this example.



Malayalam morphology analyser version 1.3.8 released.


The Malayalam font Nupuram has new alpha releases.


The Malini Malayalam font  has new alpha version, v1.000-alpha.14

SFST Python Binding

SFST 1.5.8 released. It now has  support for Python 3.12

Other News and Events

  1. Open Data Kerala team built a demo website to illustrate the power of open data and free software based language computing tools by building an easy search engine interface for the Encyclopedic content maintained by Kerala State Institute of Encyclopaedic Publications. This includes transliteration search, fuzzy matching and similar useful features.
  1. Vishnu Prasad J collated a massive Malayalam Text corpus dataset which includes SMC Text corpus dataset, AI4Bharat dataset and CulturaX dataset. This data can be used for Malayalam LLM pre-training, tokenizer experiments etc.
  2. Kavya Manohar presented a talk on "How Computers Understand Languages?". It was an introductory Natural language Processing talk to the school childeren who are members of SNV Library Peringammala, Thiruvanathapuram on 28th April, 2024.
  3. International Centre For Free and Open Source Solutions (ICFOSS) hosted "National Level Open Hardware-IoT Geospatial Hackathon" on April 29, 2024. Jaisen Nedumpala, an active member of SMC and OSM  attended the event.
  4. Kurian Benoy, in his paper reading session on the Vistaar paper by AI4Bharat team, briefly mentioned about the issues of Whisper Text Normalizer and the work done by Kavya Manohar & himself to identify and rectify issues in Whisper Text Normalizer.
  5. A shared task to finetune Machine Translation systems for Indic languages is being hosted as part of Ninth Conference on Machine Translation (WMT24).
  6. Manoj Karingamadathil presented a talk on the "Novel trends and possibilities in the field of Online encyclopedia" at the Kerala Encyclopedic institute.
  7. Zendalona is participating in GSoC this year and 4 new contributors will be working on 4 projects related to FOSS accessibility solutions for visually impaired. Read the program page for more details. Akshay S Dinesh is one of the mentors.
  8. Malayalam Wikipedia is hosting an Editathon in connection with the 2024 Indian General Election during April 15 - June 15 2024.