Datasets
Standard Dataset
Comments or Issues: Where to Document Technical Debt?
- Citation Author(s):
- Submitted by:
- Jose Pires Xavi...
- Last updated:
- Wed, 11/24/2021 - 10:39
- DOI:
- 10.21227/xxqj-1w30
- License:
- Categories:
- Keywords:
Abstract
Self-Admitted Technical Debt (SATD) is a form of Technical Debt where developers document the debt using source code comments (SATD-C) or issues (SATD-I). However, it is still unclear the circumstances that drive developers to choose one or another. In this paper, we survey authors of both types of debts using a large-scale dataset containing 74K SATD-C and 20K SATD-I instances, extracted from 190 GitHub projects. As a result, we provide 13 guidelines to support developers to decide when to use comments or issues to report Technical Debt.
This dataset contains:
1. overview_dataset.csv: details the studied repositories and their corresponding number of SATD-C and SATD-I instances;
2. satdc.csv: contains 74,306 SATD-C instances, mined from 182 GitHub repositories;
3. satdi.csv: contains 20,265 SATD-I instances, mined from 190 GitHub repositories;
4. SurveyAnswers.pdf: presents, anonymously, answers from 59 developers responsible for creating both SATD-C and SATD-I instances in our dataset. It also includes the result of the open-card sorting method applied to elicit our proposed guidelines.