Software Metrics Based on Developer's Activity and Context of Software Development

Martin Konôpka

Master thesis project supervised by prof. Mária Bieliková


Motivation

Traditional software metrics provide us with feedback about a developed software throughout its life cycle, e.g. lines of code or maintainability index, although they are mostly based on source code contents. This limits their usage in maintenance to understand developers’ original intents, decisions, or inspirations during the software development. This motivates us to employ software developer activity as a source for identification of implicit source code dependencies to extend existing dependency graph with information not explicitly stated in the source code.

Source code dependencies, as connections between pairs of source code components, traditionally reflect explicit statements in the source code, e.g., instance or type reference, inheritance relationship, or call statements. The explicit nature of these dependencies limits their usage when adding new functionality, fixing a bug or refactoring the source code because of the lack of information about developer’s reasoning. Moreover, explicit source code dependencies are difficult (or even impossible) to identify for dynamically-typed languages, e.g., JavaScript, Ruby, PHP, or when components are very loosely-coupled.

Our work is inspired by the research project PerConIK (Personalized Conveying of Information and Knowledge) with its goal – to bring new software metrics based on evaluating data of developer activity and context of software development.

Implicit Source Code Dependencies

Developer’s navigation and operations on source code in an integrated development environment (IDE) reveal existence of dependencies between selected components, even when there no explicit dependencies exist or we were not able to identify them in traditional way. In our work we have extended definition of explicit dependency graph with implicit dependencies and identified these types of specialized implicit dependencies:

  • Time-related – navigation in a source code – open, close and switch-to another component.
  • Content-related – copy-pasting code fragment from one source code component into another.
  • Commit-related – committing a collection of source code components to a revision control system.

Identified implicit dependencies are weighted using a time window of visited component, content of a copied fragment or number of components in a commit respectively. Implicit dependencies are different to explicit ones because we are not able to precisely validate their significance with checking the source code contents. To validate implicit dependencies we chose to use a forgetting function to model decay of dependencies in time.

After the identification process we aggregate single implicit dependencies into edges for the final dependency graph and provide it in our own prototype or in the Microsoft Visual Studio environment.

Dependency graph with implicit dependencies in Microsoft Visual Studio.

Evaluation

To evaluate contribution of implicit dependencies in software development and maintenance, we used data of 5 student software projects provided by the PerConIK project. The data used for our experiments were gathered during one year of recording students of master courses Software Engineering and Information Systems.

In our first experiment we looked how implicit dependencies intersect explicit ones, to show if they can substitute explicit dependencies in case of their unavailability (e.g. dynamically-typed languages). Our method showed good results of covering up to 79% of all explicit dependencies with implicit dependencies. Also, 50% of all identified implicit dependencies were not included in explicit dependency graphs for each evaluated software project.

In the second experiment we evaluated how implicit dependencies enrich dependency graph with new significant connections usable during the maintenance. We asked monitored students to manually evaluate implicit edges in the graphs and achieved the precision ranging from 75% up to 92% for the evaluated software projects, where differences in the type of a project, amount of available data and number of participated developers had impact on the results.

Publications

Konôpka, M.
Software Metrics Based on Developer's Activity and Context of Software Development. Master thesis, Slovak University of Technology in Bratislava, May 2014. 60p. pdf (in Slovak)

Konôpka, M., Bieliková, M.
Software Developer Activity as a Source for Identifying Hidden Source Code Dependencies. In SOFSEM 2015: Theory and Practice of Computer Science (to appear).

Konôpka, M.
Identifying Hidden Source Code Dependencies from Developer's Activity. In Proc. of 10th Student Research Conference in Informatics and Information Technologies Bratislava (IIT.SRC 2014). Nakladateľstvo STU, Bratislava, Slovakia, 2014, pp. 474-479.

to Homepage to Teaching to the Top

Home
Research
Projects
Publications
Books
SCM
Teaching
Links
Last updated:
Mária Bieliková bielik [zavináe] fiit-dot-stuba-dot-sk
Design © 2oo1 KoXo