Samam.net is the online 4-language Dravidian dictionary (Malayalam, Tamil, Telugu and Kannada). Its content was compiled by the veteran Njattyela Sreedharan (ഞാറ്റ്യേല ശ്രീധരൻ) in a span of 25 years.
SMC is proud to have technically contributed to its digitization efforts. Subin Siby has documented the process in detail in his blog post.
The release event was held at Bangalore International Convention Centre, co-hosted by Indic Digital Archive Foundation and the Karnataka Chapter of Malayalam Mission. During the event, the integration of E. K. Kurup's English-Malayalam Thesaurus to Olam was announced.
Many active members of SMC including Subin Siby, Akshay S Dinesh, Joice Joseph, Santhosh Thottingal, Hiran Venugopalan and Manoj Karingamadathil participated in the event.
Malayalam Text Corpus
SMC's Malayalam corpus is getting expanded. Text corpus from Sayahna books, Deshabhimani, Sarvavijnanakosam were recently added. All these new content sources are creative commons licensed. Content is semi-automatically cleaned for common mistakes and normalized.
Mozilla Foundation's latest release of common voice 17.0 is now available on Huggingface hub. It has a total of 11 hours of Malayalam speech by 134 speakers. Of these 4 hours of speech is validated manually for correctness.
Metapost Sandbox
The metapost sandbox by Santhosh Thottingal allows you to experiment with designing letters using metapost.
It latest release has many new features. You can now login using github, save the fork, fork somebody's work and list all your works as in this example.
Open Data Kerala team built a demo website to illustrate the power of open data and free software based language computing tools by building an easy search engine interface for the Encyclopedic content maintained by Kerala State Institute of Encyclopaedic Publications. This includes transliteration search, fuzzy matching and similar useful features.
Kavya Manohar presented a talk on "How Computers Understand Languages?". It was an introductory Natural language Processing talk to the school childeren who are members of SNV Library Peringammala, Thiruvanathapuram on 28th April, 2024.
Kurian Benoy, in his paper reading session on the Vistaar paper by AI4Bharat team, briefly mentioned about the issues of Whisper Text Normalizer and the work done by Kavya Manohar & himself to identify and rectify issues in Whisper Text Normalizer.
A shared task to finetune Machine Translation systems for Indic languages is being hosted as part of Ninth Conference on Machine Translation (WMT24).
Manoj Karingamadathil presented a talk on the "Novel trends and possibilities in the field of Online encyclopedia" at the Kerala Encyclopedic institute.
Zendalona is participating in GSoC this year and 4 new contributors will be working on 4 projects related to FOSS accessibility solutions for visually impaired. Read the program page for more details. Akshay S Dinesh is one of the mentors.
Malayalam Wikipedia is hosting an Editathon in connection with the 2024 Indian General Election during April 15 - June 15 2024.