Recommending Based on Similarity Relations

Dušan Zeleník

Master thesis project supervised by prof. Mária Bieliková


When readers search for an article, they would like to find something what is new, unknown but interesting. Many recommending systems use user preferences to generate suggestions. Our motivation is to automatically extract preferences using logs and use them to recommend appropriate articles in news portal. We wanted to focus on both content and time relevancy. Our content based recommender is then able to suggest articles, which are new, uknown and interesting for specific user.


Our method could be understood as two separated abilities of the recommender.
  • find similar articles using content of the article
  • find interesting articles using logs of the user activity
We use similarity among articles which is used to hierarchically sort these articles. Hierarchy is built incrementally as far as new articles are published. We extract keywords from each article to be used in the hierarchy. Method is then able to search for similar articles in the hierarchy of the articles. This searching is logarithmical, so it is effective even for big dataset of articles.
Readers are monitored using action logging. Everytime user displays an article, system stores this action. Actions represents the interest of specific user. We are able to map each article displayed by the user onto nodes in the hierarchy. User normally reads articles which are close. Majority of articles are similar in the fact. This is how we locate interests which were covered. Other articles, which could be interesting are stored nearby in the hierarchy. Method is used to pick few of articles, which are sorted using time relevancy.


There are more methods how to evaluate our recommender and its abilities. We can evaluate similar article finder and recommender itself separately to show partial success.
We used brute force method which searches for similar articles in our dataset by comparing each article with the rest. We used the same comparation, but in our hierarchical approach. Results were both mathematically and experimentally better in our case (logarithmical vs. linear).
To evaluate recommender and quality of recommendations we chose two alternatives.
  • real usage of the recommender and feedback
  • simulated usage and comparison with real reading
Since simulated usage is based on real usage, we focused on this option. Real usage and feedback seemed to be for long term observation. We used history of 1000 active users instead. Those were chosen by their activity. Their activity had to be greater than 50 articles in 5 days. Selected users and their activity is then divided in two intervals. First is used as training interval and the second one is used to evaluate. Our recommender suggests 10 articles according to activity from the first interval. These 10 recommendation items are compared to the activity from the second interval in standard precision/recall method.


Zeleník, D.
Recommending Based on Similarity Relations. Master thesis, Slovak University of Technology in Bratislava 2010. 51p. pdf (in Slovak)

Zeleník, D., Bieliková M.
Dynamics in Hierarchical News Classification. In Proc. of Workshop on Intelligent and Knowledge Oriented Technologies - WIKT 2009, Herľany, Slovakia, pp.83-87. (In Slovak)

Zeleník, D.
Representing Similarity for News Recommending. In Proc. of Student Research conference - IIT.SRC 2010, M. Bieliková (Ed.), Bratislava, Slovakia, 2010, pp.114-121.

  Project web page

to Homepage to Teaching to the Top

Last updated:
Mária Bieliková bielik [zavináe] fiit-dot-stuba-dot-sk
Design © 2oo1 KoXo