This dataset is extracted from GitHub and contains 172,919 java source codes written by 3,128 authors. It can be used for authorship attribution.