Github Dataset for Authorship Attribution This dataset is extracted from GitHub and contains 172,919 java source codes written by 3,128 authors. It can be used for authorship attribution. Categories: Machine Learning Security