Feature Representation Repository for Defect Prediction

Citation Author(s):
Mansi
Gupta
Kumar
Rajnish
Vandana
Bhattacharjee
Submitted by:
Mansi Gupta
Last updated:
Mon, 08/29/2022 - 03:33
DOI:
10.21227/dxef-tx53
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The software engineering community is working to develop reliable metrics to improve software quality. It is estimated that understanding the source code accounts for 60% of the software maintenance effort. Cognitive informatics is important in quantifying the degree of difficulty or the efforts made by developers to understand the source code. Several empirical studies were conducted in 2003 to assign cognitive weights to each possible basic control structure of software, and these cognitive weights are used by several researchers to evaluate the cognitive complexity of software systems. In this paper, an effort has been made to categorize the Control Flow Graphs (CFGs) nodes according to their node features. In our case, we extracted seven unique features from the program, and each unique feature was assigned an integer value that we evaluated through Cognitive Complexity Measures (CCMs). We then incorporated CCMs' results as a node feature value in CFGs and generated the same based on the node connectivity for a graph. In order to obtain the feature representation of the graph, a node vector matrix is then created for the graph and passed to the Graph Convolutional Network (GCN). We prepared our data sets using GCN output and then built Deep Neural Network Defect Prediction (DNN-DP) and Convolutional Neural Network Defect Prediction (CNN-DP) models to predict software defects. The Python programming language is used, along with Keras and TensorFlow. Three hundred twenty Python programs were written by our talented UG and PG students, and all experiments were carried out during laboratory classes. Together with three skilled lab programmers, they compiled and ran each individual program and detected defect/no-defect programs before categorizing them into three different classes, namely Simple, Medium, and Complex programs. Accuracy, Receiver Operating Characteristics (ROC), Area Under Curve (AUC), F-measure, Precision and hyper-parameter tuning procedures are used to evaluate the approaches. The experimental results show that the proposed models outperformed state-of-the-art methods such as Nave Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF) in all evaluation criteria.

Instructions: 
  1. Three Hundred twenty Python programs has been prepared by our skilled UG and PG students and all experiments were conducted during laboratory classes. Along with three skilled lab programmers have compiled and run each individual program and have detected defect/no-defect programs and finally they have categorized in into three different categories of classes i.e., Simple, Medium and Complex programs.
  2.  An effort has been made to categorize the Control Flow Graphs (CFGs) nodes according to their node features (in our case, we extracted seven unique features from the program i.e. IF, FOR, WHILE, FUNCTION CALL, INPUT, OUTPUT and EXPRESSION) and each unique feature was assigned an integer value and evaluated through Cognitive Complexity Measures (CCMs) and incorporated CCMs' results as a node feature value in CFGs and generated the same based on the node connectivity for a graph.
  3. In order to obtain the feature representation of the graph, a node vector matrix is then created for the graph and passed to the Graph Convolutional Network (GCN) which aggregates information and generates useful representation for nodes in a graph. Then we prepared our data sets (Simple, Medium and Complex) using GCN output.
  4. Finally, we formulate Deep Neural Network Defect Prediction (DNN-DP) and Convolutional Neural Network Defect Prediction (CNN-DP) models to predict software defects.

Comments

very good

Submitted by tariq hadidi on Tue, 10/25/2022 - 23:58

very good

Submitted by tariq hadidi on Tue, 10/25/2022 - 23:58