Motivation and Goals
IT managers and software development team leaders are increasingly challenged with the need to improve the efficiency and quality of software system development. Many of their decisions (e.g., planning, assigning tasks) can be supported by choosing the right persons for (development) tasks. Expert recommendation systems in software engineering can help to identify/locate and to recommend individuals (experts) with the high level of expertise in a given software artifact.
Software developer's expertise in a given piece of source code can have an impact on productivity. For example, a developer who knows the functions/methods and classes to use for a target task does not need to consult the documentation nearly as much. Expert recommendation can be also aimed at identifying an expert in a part of a software system for assigning an issue task. The time required to fix a bug can be significantly reduced if the task is assigned to the developer who knows the particular source code or he or she has experience with the type of the particular issue.
In this thesis we deal with the problem of how to identify and to recommend an appropriate person(s) (expert) for a target (newly created) development task of a software project. Development of a software system is an incremental activity. Software developers solve tasks, create and modify (refactor) software artifacts. As a software developer works on a software system, his or her expertise in (familiarity with) a domain of the system differs from other developers working on the same system. With increasing size of a software project and the amount of developers working on its development the problem of identifying the right persons (experts) has become increasingly important, especially in distributed teams. We identified the following open problems:
We aim to address these problems by devising methods for automatic domain model acquisition, expertise estimation, and expert recommendation at the level of conceptual concerns. In particular, thesis goals are:
Contributions achieved in this work are as follows:
Developer model overlaying domain model. We proposed a method for automatic domain model acquisition. The proposed domain model consists of a metadata layer and a topics layer. Both layers are abstractions of a repository layer. Metadata elements encapsulate relevant domain terms. The relevant domain terms are extracted from resources of the repository layer. Topics are inferred from a corpus of the metadata elements. Associations between metadata elements and topics are taken from the repository layer, a dependency model and an inference process.
The domain model provides clear separation between topics representing the domain conceptualization and resources such as: tasks defining developers' work, software artifacts resulting from developers' activities for the tasks, and interactions capturing and describing developers' activities with the artifacts. Developer model overlays the domain model. It captures relationships between the developer and the domain model elements. Developer's relationships characteristics are stored in a form of attributes of the relationships. A developer–topic relationship can have attributes (expertise characteristics) such as a degree of the developer's familiarity with the topic and his or her development productivity on the topic. The data stored in the repository layer are used to infer the developer's expertise characteristics.
This hierarchy provides software project-related information at different levels of abstraction. From the topics layer through the metadala layer we can get to particular resources. The proposed domain model is easily extendible by adding additional types of resources and associations between them. For example, by extending the repository layer of an email communication archive we can add an email metadata element consisting of, e.g., keywords. By adding associations and relationships between the elements and senders/recipients, the domain model can be extended of a communication network, i.e. a graph that model interactions between developers.
Method for estimation of developer's expertise. We proposed a novel method for estimation of developer's expertise of a topic. We claim that one thing is to be aware of the existence of some functionality, another thing is to be able to perform a correction, enhancement, or reduction of source code effectively. Therefore, we combine two proposed metrics. The first one estimates the developer's development productivity on the topic. The second one estimates his or her degree of familiarity with the topic. Both metrics are based on the previous developer's work activities (resolved tasks) on the subject topic.
Developer's development productivity on a topic is estimated from the developer's development sessions that include the subject topic. It is calculated as a ratio of the performed interactions multiplied by the complexity/size of the changes over spent time. Developer's familiarity with the topic is estimated from his or her real code contributions in the sessions that include the topic.
Method for expert recommendation. We proposed a method to recommend an expert for a newly created development task at the level of topics. It is based on the construction of two topic models, namely, a task topic model and a code topic model. The task topic model is created from the corpus containing metadata documents obtained from tasks. The code topic model is created from the corpus containing metadata documents obtained from source code entities. The task topic model is used to map tasks in natural language to the topics inferred from source code.
The crucial problem in expertise-finding is the lack of a clear baseline with which to compare expertise methods to each other. Without a clear baseline, it is difficult to determine which automatic expertise method best characterizes developer's expertise.
We conducted experiments on five open-source projects and performed a case study on two commercial/closed software projects. Although, experiments performed on the five open source projects and the two commercial projects do not provide enough justification of generality of our approach, the general results indicate that our approach can be useful in recommending experts.
In our approach we consider that both aspects such as developer's familiarity with a topic of a task and his or her development productivity on the topic should be taken into account in recommending developers for new tasks. We claim that this approach is the first that combines both these aspects.
Estimation of developer's expertise can be a valuable asset for a software company. We argue that automatic estimation of expertise can be beneficial in the planning of a software project, especially in assigning development tasks. The evolution or maintenance of a software system can be more effective if we assign the right person (an expert) to a given (development/maintenance) task.
The time required to implement a new functionality, to change an existing functionality, or to fix a bug can be significantly reduced if the task is assigned to a developer who knows the source code (a particular part of the software system). Moreover, looking at the software system as a ``web'' of topics we can recommend the most expert developer to a task who best covers the task in terms of his or her expertise on the topic of the task.The thesis extended abstract is available in the Bulletin of the ACM Slovakia.