Datasets
Standard Dataset
Dataset for Characterizing the Occurrence of Dockerfile Smells in Open-Source Software
- Citation Author(s):
- Submitted by:
- Yang Zhang
- Last updated:
- Tue, 05/17/2022 - 22:17
- DOI:
- 10.21227/r9v8-4f07
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
Dockerfile plays an important role in the Docker-based containerization process, but many Dockerfile codes are infected with smells in practice. This dataset contains a collection of 6,334 projects to help developers gain some insights into the occurrence of Dockerfile smells. Those projects belong to 10 popular programming languages, i.e., Shell, Makefile, Ruby, PHP, Python, Java, HTML, CSS, JavaScript, and Go.
This dataset contains 6,334 projects, including their metadata (i.e., names, owner type, creation times, programming languages, number of stars, and number of contributors), and details of Dockerfile smells (i.e., number of instructions, number of overall smells, number of DL-smells, and number of SC-smells).
Specifically, the metrics in the CSV dataset are:
-
project: the project name;
-
p_language: project’s programming language;
-
p_contributors_team: number of project contributors (submitted at least one commit);
-
p_created_at: project's creation date;
-
p_owner_type: type of the project owner, i.e., “Organization” or “User”;
-
p_stars: number of project stars;
-
p_github_age: number of days that have passed since a project has been hosted on GitHub until April 2018;
-
d_instructions: number of instructions in a Dockerfile;
-
d_smells: the volume number of all smells in a Dockerfile;
-
d_smells_dl: the volume number of DL-smells in a Dockerfile;
-
d_smells_sc: the volume number of SC-smells in a Dockerfile.