December 27, 2018

Malayalam Phonetic Analyser: Python library v1.0.0 Released

A detailed note by Kavya Manohar.

In the previous post,  I had shared the work in progress version of a finite state transducer  based Malaylam phonetic analyser. A phonetic analyser analyses the  written form of the text to give the phonetic characteristics of the  grapheme sequence.

Understanding the phonetic characteristics of a word is helpful in  many computational linguistic problems. For instance, translating a word  into its phonetic representation is needed in the synthesis of a text to speech (TTS) system. The phonetic representation is helpful to transliterate the  word to a different script. It will be useful if the phonetic  representation can be converted back to the grapheme sequence.

The first version of project mlphon is now released. It is packaged as a python library in Pypi. You can now install it by

pip install mlphon

It has built-in methods for bidirectional grapheme to phoneme conversions, IPA mappings and a syllablizer. These three functions has  command line tools as well. Tryout for yourself.

Examples

Syllablizer

$ mlsyllablizer

For the input

സഫലമീയാത്ര

the output would be

<BoS>സ<EoS><BoS>ഫ<EoS><BoS>ല<EoS><BoS>മീ<EoS><BoS>യാ<EoS><BoS>ത്ര<EoS>

['സ', 'ഫ', 'ല', 'മീ', 'യാ', 'ത്ര']

<BoS> indicate the beginning of a syllable and <EoS> the end of a syllable.

G2P analysis and synthesis

 $ mlg2p -a

Give the input

കാവ്യ

It will give you the result of g2p analysis as:

<BoS>k<plosive><voiceless><unaspirated><velar>aː<v_sign><EoS><BoS>ʋ<approximant><labiodental><virama>j<glide><palatal>a<schwa><EoS>

The details of each phoneme are given in angle brackets. The  operation is bidirectional. You can retrieve the graphemes from the  analysis string.

IPA analysis and synthesis

If the phonetic detailing is not relevant to you, a minimal mapping of the graphemes to IPA can be obtained by

$ mlipa -a

For the input

കൽക്കണ്ടം

The output would be

kal<chil>kkaɳʈam<anuswara>

Certain tags like <chil>, <anuswara>, <visaraga> are retained so that bidirectional analysis and generation are unambigously possible.

More details on its usage is available in the PyPi documentation as well as in the README section of mlphon repository.

Will update the progress here. For a quick web demo of what mlphon does, checkout this link https://phon.smc.org.in/.

References

  1. Malyalam Phonetic Analyser git repository
  2. Pypi repository of mlphon
  3. Malyalam morphological analyser using finite state transducers
  4. The Festvox Indic Frontend for Grapheme-to-Phoneme Conversion
  5. Malayalam Phonetic Archive by Thunchath Ezhuthachan Malayalam University
  6. IPA and sounds

This article was originally written by Kavya Manohar and published at kavyamanohar.com