Automatic Estimation of Software Developer's Expertise

Eduard Kuric

Doctoral thesis project supervised by prof. Mária Bieliková


Motivation and Goals

IT managers and software development team leaders are increasingly challenged with the need to improve the efficiency and quality of software system development. Many of their decisions (e.g., planning, assigning tasks) can be supported by choosing the right persons for (development) tasks. Expert recommendation systems in software engineering can help to identify/locate and to recommend individuals (experts) with the high level of expertise in a given software artifact.

Software developer's expertise in a given piece of source code can have an impact on productivity. For example, a developer who knows the functions/methods and classes to use for a target task does not need to consult the documentation nearly as much. Expert recommendation can be also aimed at identifying an expert in a part of a software system for assigning an issue task. The time required to fix a bug can be significantly reduced if the task is assigned to the developer who knows the particular source code or he or she has experience with the type of the particular issue.

In this thesis we deal with the problem of how to identify and to recommend an appropriate person(s) (expert) for a target (newly created) development task of a software project. Development of a software system is an incremental activity. Software developers solve tasks, create and modify (refactor) software artifacts. As a software developer works on a software system, his or her expertise in (familiarity with) a domain of the system differs from other developers working on the same system. With increasing size of a software project and the amount of developers working on its development the problem of identifying the right persons (experts) has become increasingly important, especially in distributed teams. We identified the following open problems:

  • Insufficient separation of domain conceptualization and resources.. Conceptual concerns of a software system refer to main technical concepts that reflect business logic or domain of the system. There is no clear distinction between the concerns representing the domain conceptualization and resources such as: tasks defining developers' work, software artifacts resulting from developers' activities for the tasks, and interactions capturing and describing developers' activities with the artifacts. A lot of knowledge about a software and its developers is contained in these resources, however, the resources and the domain model are often seen as one layer. The vocabulary or language used in source code differs from the one used for describing development tasks. It results in a lack of flexibility in estimation of developer's expertise at the level of the concerns and the level of the particular software artifacts.
  • None or minimal consideration of developer's development skills.. Existing approaches to estimate developer's expertise in a part of a software system usually rely on the assumption that the developer's commits to the code reflect his or her expertise in (familiarity with) that part of the system. However, while solving tasks the software developers rewrite their code, reverse (undo) changes, try alternatives, familiarize with surrounding code, explore information space, etc. Estimation of developer's expertise in source code has an impact on how quickly and successfully development task is supposed to be solved. The concept of expertise includes both knowledge and skills. To estimate developer's expertise in a software there should be considered both his or her knowledge and development skills. One thing is to be aware of the existence of some functionality, another thing is to be able to perform a correction, enhancement, or reduction of source code effectively in terms of the spent effort.
  • Limited interconnection of tasks with codebase in expert identification.. Existing approaches to recommend an expert for a newly created development task are often based on use of a repository to identify the person who has solved the tasks similar to the target task. The similarity is measured through comparing descriptions of the tasks. The time required to implement a new functionality, to change an existing functionality, or to fix a bug can be significantly reduced if the task is assigned to a developer who knows corresponding source code (a particular part of a software system). When the developer's expertise is estimated only at the level of the tasks, it does not consider sufficiently the developer's expertise of a particular part of the codebase of a software system. The source code that a developer has to consult in the target task does not have to be the same as in the similar tasks. Looking at the software system as a web of conceptual concerns inferred from both the tasks and codebase we can recommend the most expert developer to a task who best covers the task in terms of his or her (estimated) expertise in the concerns included in the target task.

We aim to address these problems by devising methods for automatic domain model acquisition, expertise estimation, and expert recommendation at the level of conceptual concerns. In particular, thesis goals are:

  • To design a domain model of a software project and a method for its acquisition that provides a clear separation between domain conceptualization and resources.
  • To propose and evaluate a method for estimation of developer's expertise at the level of conceptual concerns that covers both developer's familiarity with tasks and codebase, and his or her development skills measured through development productivity reflecting the effort he or she expends while working on the particular concern.
  • To propose and evaluate a method to recommend an expert for a newly created development task at the level of conceptual concerns inferred from both the tasks and the codebase of a software system.

Results

Contributions achieved in this work are as follows:

Developer model overlaying domain model. We proposed a method for automatic domain model acquisition. The proposed domain model consists of a metadata layer and a topics layer. Both layers are abstractions of a repository layer. Metadata elements encapsulate relevant domain terms. The relevant domain terms are extracted from resources of the repository layer. Topics are inferred from a corpus of the metadata elements. Associations between metadata elements and topics are taken from the repository layer, a dependency model and an inference process.

The domain model provides clear separation between topics representing the domain conceptualization and resources such as: tasks defining developers' work, software artifacts resulting from developers' activities for the tasks, and interactions capturing and describing developers' activities with the artifacts. Developer model overlays the domain model. It captures relationships between the developer and the domain model elements. Developer's relationships characteristics are stored in a form of attributes of the relationships. A developer–topic relationship can have attributes (expertise characteristics) such as a degree of the developer's familiarity with the topic and his or her development productivity on the topic. The data stored in the repository layer are used to infer the developer's expertise characteristics.

This hierarchy provides software project-related information at different levels of abstraction. From the topics layer through the metadala layer we can get to particular resources. The proposed domain model is easily extendible by adding additional types of resources and associations between them. For example, by extending the repository layer of an email communication archive we can add an email metadata element consisting of, e.g., keywords. By adding associations and relationships between the elements and senders/recipients, the domain model can be extended of a communication network, i.e. a graph that model interactions between developers.

Method for estimation of developer's expertise. We proposed a novel method for estimation of developer's expertise of a topic. We claim that one thing is to be aware of the existence of some functionality, another thing is to be able to perform a correction, enhancement, or reduction of source code effectively. Therefore, we combine two proposed metrics. The first one estimates the developer's development productivity on the topic. The second one estimates his or her degree of familiarity with the topic. Both metrics are based on the previous developer's work activities (resolved tasks) on the subject topic.

Developer's development productivity on a topic is estimated from the developer's development sessions that include the subject topic. It is calculated as a ratio of the performed interactions multiplied by the complexity/size of the changes over spent time. Developer's familiarity with the topic is estimated from his or her real code contributions in the sessions that include the topic.

Method for expert recommendation. We proposed a method to recommend an expert for a newly created development task at the level of topics. It is based on the construction of two topic models, namely, a task topic model and a code topic model. The task topic model is created from the corpus containing metadata documents obtained from tasks. The code topic model is created from the corpus containing metadata documents obtained from source code entities. The task topic model is used to map tasks in natural language to the topics inferred from source code.

The crucial problem in expertise-finding is the lack of a clear baseline with which to compare expertise methods to each other. Without a clear baseline, it is difficult to determine which automatic expertise method best characterizes developer's expertise.

We conducted experiments on five open-source projects and performed a case study on two commercial/closed software projects. Although, experiments performed on the five open source projects and the two commercial projects do not provide enough justification of generality of our approach, the general results indicate that our approach can be useful in recommending experts.

Conclusions

In our approach we consider that both aspects such as developer's familiarity with a topic of a task and his or her development productivity on the topic should be taken into account in recommending developers for new tasks. We claim that this approach is the first that combines both these aspects.

Estimation of developer's expertise can be a valuable asset for a software company. We argue that automatic estimation of expertise can be beneficial in the planning of a software project, especially in assigning development tasks. The evolution or maintenance of a software system can be more effective if we assign the right person (an expert) to a given (development/maintenance) task.

The time required to implement a new functionality, to change an existing functionality, or to fix a bug can be significantly reduced if the task is assigned to a developer who knows the source code (a particular part of the software system). Moreover, looking at the software system as a ``web'' of topics we can recommend the most expert developer to a task who best covers the task in terms of his or her expertise on the topic of the task.

The thesis extended abstract is available in the Bulletin of the ACM Slovakia.

Selected publications

Kuric, E., Bieliková, M.
Estimation of student's programming expertise.. In Proc. of the 8th ACM/IEEE international symposium on empirical software engineering and measurement ESEM'14. ACM, p. 4 (2014)
Kuric, E., Bieliková, M.
Webification of software development: user feedback for developer's modeling.. In Proc. of the 14th International conference on Web Engineering ICWE'14, Springe-Verlag, pp. 550-553 (2014)
Bieliková, M., Polášek, I., Barla, M., Kuric, E., Rástočný, K., Tvarožek, J., Lacko, P.
Platform independent software development monitoring: design of an architecture. In Proc. of the 40th International conference on current trends in theory and practice of computer science SOFSEM'14, Springer-Verlag, pp. 126-137 (2014)
Kuric, E., Bieliková, M.
Search in source code based on identifying popular fragments. In Proc. of the 39th international conference on current trends in theory and practice of computer science SOFSEM'13, Springer-Verlag, pp. 408-419 (2013)
Kuric, E., Bieliková, M.
ANNOR: efficient image annotation based on combining local and global features. In Computers & Graphics. Vol. 47, No. 2, pp. 1-15. (2015) [based on master thesis results]

to Homepage to Teaching to the Top

Home
Research
Projects
Publications
Books
SCM
Teaching
Links
Last updated:
Mária Bieliková bielik [zavináč] fiit-dot-stuba-dot-sk
Design © 2oo1 KoXo