Foreign Word Detection, Malayalam Writing Learning Portal and More: SMC Monthly Updates October 2020
Foreign Word Detection in mlmorphPython library for Malayalam morphological analyzer - mlmorph released version 1.
We are happy to present the first ever monthly report on the activities of Swathanthra Malayalam Computing(SMC). We plan to prepare this reports every month from now onwards. During our last meet up, we agreed that there is a need to document our scattered activities and projects. Hence this report. Hope you will find it useful.
SMC Developers had a gathering at entri.me office at Kochi on 31st March and April 1st, 2018. Ashik Salahudeen initiated the plan for a meet-up in the mailing list in January. A very enthusiastic group of 15 people from the developer community poured in their ideas for improvising the activities of SMC.
There were vibrant discussions on revamping the activities of SMC as a free-software developer organisation. Hiran, Ashik and Hrishikesh did a fantastic job of collecting the ideas from participants and coordinating its conversion into action plans.
The plan to publish monthly report on SMC activities was an important suggestion that came up in the meeting. This is the first blog post in this regard.
Participants urged the need to document the projects and products owned by SMC. It should be sufficiently detailed for the sake of users as well as contributers.
As per the discussions in the meet-up, Ashik prepared a list of organisational and project TODOs here. Members are actively working towards completing the tasks. We are more than happy to welcome anybody wants to join!
There is a wonderful news for designers who use Adobe's creative cloud products. Adobe Typekit started hosting SMC's fonts. SMC as a font foundry maintains as many as a dozen of Malayalam fonts. Adobe creative cloud products will list most of these fonts and can now be chosen easily by designers.
These fonts can also be used as webfonts and performs well with Adobe's geographically distributed content distribution network.
Balu did an awesome job of migrating SMC servers for its services including blog, wiki and planet.
SMC blog brings together the news and researches in the area of Malayalam computational linguistics. The projects and products of SMC and its activities can be seen at our wiki. Planet archives the group members blogs.
Balu is actively helping us in using Gitlab features to run our websites and project's continuous integration pipelines.
The server is sponsored by Manu Krishnan T V, Stackplex Web Solutions
SMC's font sourcecode repositories are now moved to a new sub-group in Gitlab. All fonts in the repos now has proper version release tags.
Kavya worked on the repo tagging. Balu did the regrouping. He is also working on automating the generation of font release tarballs and website updation. The latest fonts can be downloaded from here.
CORS Headers enabled for the webfonts published by SMC. This means any website that want to use SMC's fonts as embedded in their webpage, can simply include the css https://smc.org.in/fonts/css/fonts.css and refer the fonts by its family names. Fonts will be served from smc.org.in. Initially there was a restriction that only smc.org.in and subdomains can use this webfonts, that restriction now removed. See an example of embedding at https://codepen.io/santhoshtr/pen/VXVPdj
The Swanalekha input method is getting an update after years. It will support inputting new unicode characters. It will also disable the suggestions UI by default, while retaining the alternate keys feature. Have a look at the new website being developed for the project.
Anoop Panavalappil, Jishnu Mohan and Santhosh Thottingal are preparing a browser extension for Swanalekha. See the repo. It will work in browsers like Chrome, Firefox and may be in Edge.
Note that Chrome and Chromium continues to have a bug related to Swanalekha input method.
Ubuntu 18.04 "bionic beaver" is released. It comes with all of the SMC's fonts packaged. This is a significant improvement since Ubuntu used to have outdated fonts.
A video illustrating how to configure fonts and input methods in Ubuntu 18.04 is available at https://www.youtube.com/watch?v=hlkty9s5t30
Even though fonts are preinstalled, Input methods still need manual installation. Install
m17n-db packages, restart ibus using
ibus restart command(or just log out and log in).
Ubuntu 18.04 comes with VLC 3.0. It has added complex script rendering support for subtitles.
Here is a perfect Malayalam rendering from 'Helvetica' subtitled by K H Hussain.
You can read an earlier blog post by Rajeesh on the complex script rendering support of VLC here.
The ICANN initiative to define International Domain Name(IDN) rules for Malayalam and other languages is showing some progress. The Proposal for a Malayalam script root zone label generation ruleset is now drafted.
Santhosh Thottingal and Anivar Aravind are members of the panel and gave feedback to the prepared draft.
You may read this blog post for a background on this topic
A paper prepared by Santhosh Thottingal and Kavya Manohar on "Malayalam Orthographic Reforms: Impact on Language and Popular Culture" is accepted for the conference Graphemics in 21st century. The conference will be held at Pôle numérique Brest Iroise at Brest, France, on June 14-15, 2018.
In March, Santhosh Thottingal and Kavya Manohar presented a paper on "Spiral splines in typeface design - A case study of Manjari Malayalam typeface" at Typoday 2018 conference, Mumbai. The paper is available at here.
Mozilla's Internet Health Report 2018 highlights the work of Santhosh Thottingal with Wikimedia Foundation and Swathanthra Malayalam Computing as a part of its Digital Inclusion Section. Read ''Building a multilingual Internet'', here.
Determined to bring his native language fully into the digital world, Thottingal taught himself computational linguistics and worked with Swathanthra Malayalam Computing, a free software collective.
“We developed input tools, fixed rendering issues, designed and developed new fonts and defined and implemented many computational algorithms for Malayalam like collation and hyphenation. And we continued to work on more complex projects,” says Thottingal. “The community project became a larger group of volunteers.”
Wired made a report on how mozilla diagnoses the health of global internet.
The report also highlights the work of a number of people trying to make the web a better place, like Santhosh Thottingal, who is working to a create a truly multilingual internet, as well as Holly Jacobs, who founded the Cyber Civil Rights Initiative, which offers support to thousands of victims of revenge porn.
Santhosh Thottingal is quoted in this report by Indian Express on 'The language challenge of World Wide Web'
Santhosh Thottingal, who builds local language technology in India, says that building for Indian scripts is particularly difficult. “They have ligatures — shapes formed by fusing more than one letters. Sometimes vowel signs get attached to consonants. Consonants get stacked, fused and so on.”
In an Economic Times report on Facebook's fake news clean up algoritham, Anivar Aravind is quoted as:
“I don’t know exactly what FB claimed. But understanding local languages, Indian languages, is still an unsolved problem — either in non-free software or free software,” says Anivar Aravind, executive director of Indic Project, a nonprofit initiative working on language engineering and digital rights of native-language users. Interestingly, the two Facebook-owned platforms: WhatsApp and FB have become the preferred social medium to spread false information and largely through regional languages.
Thats all! Thanks for reading. We hope we covered all events and updates. If anything missed, let us know.