Automatic calculation of process metrics and their bug prediction capabilities

Gyimesi Péter: Automatic calculation of process metrics and their bug prediction capabilities. In: Acta cybernetica, (23) 2. pp. 537-559. (2017)

[thumbnail of actacyb_23_2_2017_7.pdf]
Preview
Cikk, tanulmány, mű
actacyb_23_2_2017_7.pdf

Download (536kB) | Preview

Abstract

Identifying fault-prone code parts is useful for the developers to help reduce the time required for locating bugs. It is usually done by characterizing the already known bugs with certain kinds of metrics and building a predictive model from the data. For the characterization of bugs, software product and process metrics are the most popular ones. The calculation of product metrics is supported by many free and commercial software products. However, tools that are capable of computing process metrics are quite rare. In this study, we present a method of computing software process metrics in a graph database. We describe the schema of the database created and we present a way to readily get the process metrics from it. With this technique, process metrics can be calculated at the file, class and method levels. We used GitHub as the source of the change history and we selected 5 open-source Java projects for processing. To retrieve positional information about the classes and methods, we used SourceMeter, a static source code analyzer tool. We used Neo4j as the graph database engine, and its query language - cypher - to get the process metrics. We published the tools we created as open-source projects on GitHub. To demonstrate the utility of our tools, we selected 25 release versions of the 5 Java projects and calculated the process metrics for all of the source code elements (files, classes and methods) in these versions. Using our previous published bug database, we built bug databases for the selected projects that contain the computed process metrics and the corresponding bug numbers for files and classes. (We published these databases as an online appendix.) Then we applied 13 machine learning algorithms on the database we created to find out if it is feasible for bug prediction purposes. We achieved F-measure values on average of around 0.7 at the class level, and slightly better values of between 0.7 and 0.75 at the file level. The best performing algorithm was the RandomForest method for both cases.

Item Type: Article
Journal or Publication Title: Acta cybernetica
Date: 2017
Volume: 23
Number: 2
ISSN: 0324-721X
Page Range: pp. 537-559
Language: English
Place of Publication: Szeged
Related URLs: http://acta.bibl.u-szeged.hu/50022/
DOI: 10.14232/actacyb.23.2.2017.7
Uncontrolled Keywords: Informatika, Kibernetika, Számítástechnika, Folyamatmutatók
Additional Information: Bibliogr.: p. 557-559. ; összefoglalás angol nyelven
Subjects: 01. Natural sciences
01. Natural sciences > 01.02. Computer and information sciences
Date Deposited: 2018. Feb. 13. 09:30
Last Modified: 2022. Jun. 20. 14:50
URI: http://acta.bibl.u-szeged.hu/id/eprint/50087

Actions (login required)

View Item View Item