Performing adaptive morphological analysis using internet resources

Authors: M. Trabalka and M. Bieliková

Reference: In Proc. of TSD'99 - Text, Speech and Dialog, V. Matoušek, P. Mautner, J. Ocelíková, P. Sojka (Eds.), Springer Verlag, Czech Republik, September 13-17, pages 66-71, 1999.


Abstract: In this paper, we describe an approach to an adaptive morphological analysis based on lexicon corpus acquired from Internet. We focus on automating categorization words into a morphological paradigm in flexive languages. It is done by inducing possible word forms using morphological knowledge base and by looking for word forms of possible inflections in a morphological lexicon.
We developed a prototype system based on the proposed approach. Our system is general (it respects language but it performs better on a flexive language). We tested the system for the Slovak language. System's lexicon is built by means of browsing Internet pages. Parsed texts, recognized to be written in Slovak, are used to establish database of Slovak words with their frequencies in texts.

