Contribution to Relational Classification with Homophily Assumption

Peter Vojtek

Doctoral dissertation project supervised by prof. Mária Bieliková


Motivation and Goals

Relational classification is a set of methods employing relations between instances in a dataset as well as their attributes. Homophily is a phenomenon present in graphs which capture real-world data, e.g., social connections between humans. Homophily is defined as following: related (neighbouring) vertices are more likely to share similarities (e.g., the same class, attribute value) as non-related instances. Contemporary relational classifiers implicitly require homophily to be present in a graph (so called homophily assumption), however these methods are unable to determine the homophily of each node and take benefit of this information. Our work is at first dedicated to classification of relational classifiers. Next, impact of homophily assumption on particular branches of relational classifiers is analysed and then homophily measures are defined. According to this analysis, two new relational classifiers are designed. First method is collective inference based and involves information exchange moderation, second method belongs to simple relational methods and employs local graph ranking in order to redefine neighbourhood function. Both methods are capable to increase the quality of class assignment in networked data due to their capability to employ and measure homophily in a graph.

Results

We proposed two novel relation classification methods which utilize homophily.

  • Method based on information exchange moderation, which enhances the simple relational branch of classifiers. The main principle of our the method is based on an observation, that classified vertices share between themselves class-membership information disregarding the quality of this shared information. Our contribution is based on moderating the information interchange of a collective-based classifier. We extended the Iterative Reinforcement Categorization (IRC) with information exchange moderation measured via homophily. The new method can assign different weight to training and testing set vertices in the iterative class-membership interchange process.

    Our method was evaluated using scientific publication dataset (research project MAPEKUS). Experiment with this large scale dataset demonstrated capabilities of our method to improve classifier performance as well as the scalability of our solution.

  • Second classification method extends Simple Relational Classifier so that direct neighborhood acquisition approach is substituted by a local graph ranking. We employed spreading activation as a local graph ranking algorithm. This method provides neighborhood with higher average homophily, which fits better the optimal curve of homophily distribution required by the classifier. This neighborhood acquisition method method is more robust and is capable to absorb broader neighborhood along with weights indicating degree of proximity of vertices.

    We evaluated this method using Social network of Slovak companies (foaf.sk). The experimental evaluation exhibited that substitution of basic neighborhood with local graph ranking provides higher classifier performance, homophily distribution is closer to the optimal distribution and time/space complexity is suitable for large-scale applications which are required to respond in real-time.

Conclusion

The goal of this work was to consider and take into account homophily of the dataset in classifier design. We revealed strong connection between relational classification methods and homophily, according to which we developed following two new classification methods: method based on information exchange moderation, which enhances the simple relational branch of classifiers, and a method based on local graph ranking which extends the collective inferencing based classifiers.

Both methods employ the homophily measures we established and according to them, the classification model provides more robust results with lower misclassification rate. In order to evaluate our methods, we co-authored two large-scale datasets: digital libraries based dataset (MAPEKUS) and Social network of Slovak Companies (foaf.sk).

Selected publications

P. Vojtek
Contribution to Relational Classification with Homophily Assumption. Dissertation thesis, Slovak University of Technology in Bratislava 2010 (in Slovak). pdf
P. Vojtek, M. Bieliková
Homophily of Neighborhood in Graph Relational Classifier. In Geffert, V., Karhumaki, J., Bertoni, A., Preneel, B., Návrat, P., & Bieliková, M. (Eds.): SOFSEM 2010: 36th Conf. on Current Trends in Theory and Practice of Computer Science, Spindleruv Mlyn, Czech Republic, LNCS 5901, Springer, 2010. pp 721–730.
P. Vojtek, M. Bieliková
Moderated Class-membership Interchange in Iterative Multi-relational Graph Classifier. In Snášel, V., Szczepaniak, P.S., Abraham, A., & Kacprzyk, J. (Eds.): AWIC 2009: Proc. of the 6th Atlantic Web Intelligence Conf. Prague, Czech Republic, AISC 67, Springer, 2010. pp 229–238.
G. Frivolt, J. Suchal, R. Vesely, P. Vojtek, O. Vozár, M. Bieliková
Creation, Population and Preprocessing of Experimental Data Sets for Evaluation of Applications for the Semantic Web. In Geffert, V., Karhumaki, J., Preneel, B., Návrat, P., & Bieliková, M. (Eds.): SOFSEM 2008: 34th Conf. on Current Trends in Theory and Practice of Computer Science, Nový Smokovec, Slovakia, LNCS 4910, Springer, 2010. pp 684–695.
P. Vojtek, M. Bieliková
Increasing the Robustness of Relational Classifier in Datasets with Low Homophily (in Slovak). Návrat, P., & Vranić, V. : 3rd Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2008), Smolenice, Slovakia.

to Homepage to Teaching to the Top

Home
Research
Projects
Publications
Books
SCM
Teaching
Links
Last updated:
Mária Bieliková bielik [zavináč] fiit-dot-stuba-dot-sk
Design © 2oo1 KoXo