SMC Monthly Newsletter: March 2024
Releases
Introducing Indic Subtitler
Indic subtitler is an open source subtitle generator for Indic Languages. Supported by SeamlessM4T, faster-whisper, WhisperX and Vegam-Whisper which support almost 12 Indic languages by default.
Read more about Indic Subtitler: https://indicsubtitler.in/about
Announcing fontra.thottingal.in
Fontra (https://fontra.xyz/) is an upcoming browser-based font editor from Google. It is not yet completely ready for serious usage, but it already has several useful features. Santhosh Thottingal has made a public instance of it available at https://fontra.thottingal.in - People interested in typeface design can use this tool to load the listed fonts, inspect glyph design, and for variable fonts, inspect the masters and their interpolation. All features are read-only, so you cannot make edits there.
Varnam IBus Engine v1.6.3
Varnam IBus Engine had a new release at #FOSSMeet24. It now works in Wayland and has Nepali support. Thanks to mistersmee and khumnath for the contributions! (link to release)
Poorna Web Editor Release
Poorna - The complete Malayalam keyboard layout has introduced a web-based editor that allows users to explore all the keyboard layouts offered by Poorna without the need for installation. Additionally, users can upload TXT files and conveniently edit them on the go.
See release blog
Updates to Metapost Sandbox
The metapost sandbox application, designed to quickly try metapost based design and outputs, got some feature updates. Now you can save your samples, share it with others. Saved samples can be retrieved and edited later by its author. Others can work on it by creating a copy. No login required.
Malini Alpha Releases
The ongoing Malini typeface design project had more alpha releases this month. See https://gitlab.com/smc/fonts/Malini.
malayalam-fonts npm package 1.0.6-beta.3
malayalam-fonts npm package released new version 1.0.6-beta.3 (npm package)
LLMs in News
- Tamil LLM finetuned: https://huggingface.co/mervinpraison/tamil-large-language-model-7b-v1.0
Tweet link: https://twitter.com/MervinPraison/status/1766796706438426961
- Naverasa model
Code: https://github.com/TeluguLLMLabs/Indic-gemma-7b-Navarasa
Models: https://huggingface.co/Telugu-LLM-Labs
- Surya benchmarked against Google OCR looks really competitive. Really good performance for Malayalam.
https://twitter.com/VikParuchuri/status/1765440195124691339
- Two Kerala-origin open-source developers, Shahul ES and Jithin James, hit a home run with their YC-funded company, ragas_io. Y Combinator announced that ragas_io is an open-source evaluation and testing infrastructure for LLM applications. Their model-graded evaluations and testing techniques ensure top-notch quality.
Tweet Link: https://twitter.com/ycombinator/status/1765497664240759133
- AI4Bharat published indicllm-suite https://ai4bharat.iitm.ac.in/blog/indicllm-suite/ - Blueprint for creating Pre-training and Fine-tuning Datasets for Indic Languages. It heavily relies on synthetic corpus(machine translated training data from other languages) to expand training data for indic languages.
The researchers behind the AI4Bharat also published a paper titled "Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese" which evaluates the approach of using machine translated training data.
Events
OpenDataDay Hackathon and Datathon
As part of OpenData Day, Sahya Digital Conservation Foundation and OpenDataKerala Community ran a 3-day residential hackathon and datathon at Vythiri, Wayanad. They created and improved the projects and tools related to OpenData ecosystems, and also ran a Datathon at OpenStreetMap and Wikidata.
#OpenData Day 2024 by @opendatakerala at Vythiri, Wayanad, Kerala with a weekend Hackathon and Datathon with @asdofindia, @Athul111, @rajaneesh_river, @jinoytomjacob and expecting more peoples to join.#OpenStreetMap #Wikidata #Biodiversity pic.twitter.com/DNUjfSsbLs
— Manoj Karingamadathil (@manojkmohan) March 2, 2024
Online talk on Malayalam Digital Aesthetics
We organized an online talk session on Malayalam Digital Aesthetics ("മലയാളത്തിന്റെ ഡിജിറ്റൽ സൗന്ദര്യം") on March 9th at 7 PM. The talk was about digital Malayalam typography.
You can see the live stream here:
The slides are available at https://santhoshtr.github.io/malayalam-digital-aesthetics/
SMC@FOSSMeet'24
Some SMC volunteers participated in FOSSMeet'24, NIT Calicut during March 22-24. We will be writing a separate blog for FOSSMeet.
Others
- New keyman update includes Poorna: https://blog.keyman.com/2024/03/keyman-update-for-29-march-2024/
- The Kerala WordPress Photo Festival was a global online photography competition organized by the WordPress Community of Kerala, India. Held between February 3rd to February 9th, 6 PM IST, our participants showcased their photographs to the world by uploading them to the official WordPress Photo Directory. 1544 photos from 160+ contributors were submitted to the WordPress Photo Directory during the event.
- The Manjari typeface has been spotted in Manjummel Boys, a blockbuster Malayalam film. Manjari is used for Malayalam subtitles in the scenes where Tamil is spoken. In the past, Manjari has been used in other movies as well, for example, Godha (2017)
- Kuttippencil is a short film directed by Hena Chandran.
It features SMC fonts designed by Santhosh Thottingal: Nupuram Caligraphy in the title and Chilanka for credits.