Performing adaptive morphological analysis using internet resources
Authors: M. Trabalka and M. Bieliková
Reference: In Proc. of TSD'99 - Text, Speech and Dialog, V. Matoušek, P. Mautner, J. Ocelíková, P. Sojka (Eds.), Springer Verlag, Czech Republik, September 13-17, pages 66-71, 1999.
- Method of Adaptive Morphological Analysis
- Acquiring Vocabulary from Internet
Abstract: In this paper, we describe an approach to an adaptive morphological analysis based on lexicon corpus acquired from Internet. We focus on automating categorization words into a morphological paradigm in flexive languages. It is done by inducing possible word forms using morphological knowledge base and by looking for word forms of possible inflections in a morphological lexicon.
We developed a prototype system based on the proposed approach. Our system is general (it respects language but it performs better on a flexive language). We tested the system for the Slovak language. System's lexicon is built by means of browsing Internet pages. Parsed texts, recognized to be written in Slovak, are used to establish database of Slovak words with their frequencies in texts.
PS version (383K file, 6 pages) together with a GZ version (98K file) is available.
||to the Top