Datasets
Standard Dataset
Software Defect
- Citation Author(s):
- Ruchika Malhotra, Arjun Rajpal, Dushyant Rathore
- Submitted by:
- Arjun Rajpal
- Last updated:
- Tue, 05/17/2022 - 22:17
- DOI:
- 10.21227/H2K078
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
Our defect dataset, comes from PROMISErepository. This data refers to open-source Java systems such as ant, camel, ivy, jedit, log4j, lucene, poi, synapse, velocity and xerces. We selected these datasets since they have at least three consecutive releases (where release i was built before release i+1). This will allow us to build defect predictors based on the past data and then predict (test) defects on future version projects, which will be a more practical scenario.
The original dataset contains a list of bugs, their characteristics and the classes to which they belong. The first step was to remove the values which belonged to class 0. The values left belonged to the defective classes. For untuned methods release i and release i+1 were combined for training purposes and tested on release i+2.For tuned methods release i was used for training, release i+1 for tuning and release i+2 for testing.
Eg.: release i in antV0 contains 20 defect classes out of 125 which was used for training and release i+1 which was used for tuning contains 40 defect classes out of 178.
The analysis procedure involved the use of Pandas library available in Python to process the dataset as per our requirements.
Comments
Sir,
I will like to use this dataset for my thesis
Sir,
I wanted to use this dataset for the thesis.
Sir,
I would like to use this dataset for my thesis.
Sir/Madam,
I would like to use the software defect datasets for my phd research.
Thank you.
Sir, I would like to use this dataset for my research
Sir,
I wanted to use this dataset for the thesis.
Thank you very much
I want to use this dataset for my thesis
Sir, I would like to use this dataset for my research
Sir,
I will like to use this dataset for my thesis
Sir,
I would like to use this dataset for my thesis.