Úvod Charakteristika Obsah Sponzori konferencie Mediálny partner |
Analysis of text on WWW pages using important information indicators Vojtěch Svátek, Petr Strossa a Martin Kavalec Department of Information and Knowledge Engineering, University of Economics Abstract. We examine the possibility of indexing web pages using collections of words indicating the most important places in the text. Two ways of constructing such collections are investigated. The first way relies on an intellectual analysis of the domain in question; a collection of 'indicators' originally developed for automated summarisation of English texts has been adapted for the analysis of Czech web pages of commercial companies. The second way is based on learning the 'indicators' from text corpora; to alleviate the burden of indexing the training data manually, an original method has been devised that reuses the previous work of the indexers of public web directories. Keywords: information extraction, data mining, WWW directories. |