Summary/Abstract |
This article explores in depth various sandhi (joining) rules in Kerala’s Malayalam language, which play a vital role in framing of the inflected and agglutinated forms of words and their compounds. It discusses significant progress in a scientific method to generate a specific annotated data set of Malayalam words that would be useful in many Natural Language Processing tasks which involve Malayalam preprocessing. The article discusses the results and issues encountered in developing this word-splitting tool for Malayalam, mainly in the context of improving the alignments between parallel texts that form a core resource in the Machine Translation task.
|