Contribution to Relational Classification with Homophily Assumption
Doctoral dissertation project supervised by prof. Mária Bieliková
Motivation and Goals
Relational classification is a set of methods employing relations between instances in a dataset as well as their attributes. Homophily is a phenomenon present in graphs which capture real-world data, e.g., social connections between humans. Homophily is defined as following: related (neighbouring) vertices are more likely to share similarities (e.g., the same class, attribute value) as non-related instances. Contemporary relational classifiers implicitly require homophily to be present in a graph (so called homophily assumption), however these methods are unable to determine the homophily of each node and take benefit of this information. Our work is at first dedicated to classification of relational classifiers. Next, impact of homophily assumption on particular branches of relational classifiers is analysed and then homophily measures are defined. According to this analysis, two new relational classifiers are designed. First method is collective inference based and involves information exchange moderation, second method belongs to simple relational methods and employs local graph ranking in order to redefine neighbourhood function. Both methods are capable to increase the quality of class assignment in networked data due to their capability to employ and measure homophily in a graph.
We proposed two novel relation classification methods which utilize homophily.
Method based on information exchange moderation, which enhances the simple relational branch of classifiers. The main principle of our the method is based on an observation, that classified vertices share between themselves class-membership information disregarding the quality of this shared information. Our contribution is based on moderating the information interchange of a collective-based classifier. We extended the Iterative Reinforcement Categorization (IRC) with information exchange moderation measured via homophily. The new method can assign different weight to training and testing set vertices in the iterative class-membership interchange process.
Our method was evaluated using scientific publication dataset (research project MAPEKUS). Experiment with this large scale dataset demonstrated capabilities of our method to improve classifier performance as well as the scalability of our solution.
Second classification method extends Simple Relational Classifier so that direct neighborhood acquisition approach is substituted by a local graph ranking. We employed spreading activation as a local graph ranking algorithm. This method provides neighborhood with higher average homophily, which fits better the optimal curve of homophily distribution required by the classifier. This neighborhood acquisition method method is more robust and is capable to absorb broader neighborhood along with weights indicating degree of proximity of vertices.
We evaluated this method using Social network of Slovak companies (foaf.sk).
The experimental evaluation exhibited that substitution of basic neighborhood with local graph ranking provides higher classifier performance, homophily distribution is closer to the optimal distribution and time/space complexity is suitable for large-scale applications which are required to respond in real-time.
The goal of this work was to consider and take into account homophily of the dataset in classifier design. We revealed strong connection between relational classification methods and homophily, according to which we developed following two new classification methods: method based on information exchange moderation, which enhances the simple relational branch of classifiers, and a method based on local graph ranking which extends the collective inferencing based classifiers.
Both methods employ the homophily measures we established and according to them, the classification model provides more robust results with lower misclassification rate. In order to evaluate our methods, we co-authored two large-scale datasets: digital libraries based dataset (MAPEKUS) and Social network of Slovak Companies (foaf.sk).
- P. Vojtek
- Contribution to Relational Classification with Homophily Assumption. Dissertation thesis, Slovak University of Technology in Bratislava 2010 (in Slovak). pdf
- P. Vojtek, M. Bieliková
- Homophily of Neighborhood in Graph Relational Classifier. In Geffert, V., Karhumaki, J., Bertoni, A.,
Preneel, B., Návrat, P., & Bieliková, M. (Eds.): SOFSEM 2010: 36th Conf. on Current Trends in Theory and Practice of Computer Science, Spindleruv Mlyn, Czech Republic, LNCS 5901, Springer, 2010. pp 721–730.
- P. Vojtek, M. Bieliková
- Moderated Class-membership Interchange in Iterative Multi-relational Graph Classifier. In Snášel, V., Szczepaniak, P.S., Abraham, A., & Kacprzyk, J. (Eds.): AWIC 2009: Proc. of the 6th Atlantic
Web Intelligence Conf. Prague, Czech Republic, AISC 67, Springer, 2010. pp 229–238.
- G. Frivolt, J. Suchal, R. Vesely, P. Vojtek, O. Vozár, M. Bieliková
- Creation, Population and Preprocessing of Experimental Data Sets for
Evaluation of Applications for the Semantic Web. In Geffert, V., Karhumaki, J.,
Preneel, B., Návrat, P., & Bieliková, M. (Eds.): SOFSEM 2008: 34th Conf. on Current Trends in Theory and Practice of Computer Science, Nový Smokovec, Slovakia, LNCS 4910, Springer, 2010. pp 684–695.
- P. Vojtek, M. Bieliková
- Increasing the Robustness of Relational Classifier in Datasets with
Low Homophily (in Slovak). Návrat, P., & Vranić, V. : 3rd Workshop on Intelligent and Knowledge Oriented
Technologies (WIKT 2008), Smolenice, Slovakia.
||to the Top