Performing adaptive morphological analysis using internet resources

Authors: M. Trabalka and M. Bieliková

Reference: In Proc. of TSD'99 - Text, Speech and Dialog, V. Matoušek, P. Mautner, J. Ocelíková, P. Sojka (Eds.), Springer Verlag, Czech Republik, September 13-17, pages 66-71, 1999.

Contents:

  1. Introduction
  2. Method of Adaptive Morphological Analysis
  3. Acquiring Vocabulary from Internet
  4. Experiments
  5. Conclusion
  6. References

Abstract: In this paper, we describe an approach to an adaptive morphological analysis based on lexicon corpus acquired from Internet. We focus on automating categorization words into a morphological paradigm in flexive languages. It is done by inducing possible word forms using morphological knowledge base and by looking for word forms of possible inflections in a morphological lexicon.
We developed a prototype system based on the proposed approach. Our system is general (it respects language but it performs better on a flexive language). We tested the system for the Slovak language. System's lexicon is built by means of browsing Internet pages. Parsed texts, recognized to be written in Slovak, are used to establish database of Slovak words with their frequencies in texts.

PS version (383K file, 6 pages) together with a GZ version (98K file) is available.

to Homepage to Publications to the Top

Home
Research
Projects
Publications
Books
SCM
Teaching
Links
Last updated:
Mária Bieliková bielik [zavináč] fiit-dot-stuba-dot-sk
Design © 2oo1 KoXo