Miltiadis Allamanis

Congratulations!  You have been automatically subscribed to IEEE DataPort and can access all datasets on IEEE DataPort!
First Name: 
Miltiadis
Last Name: 
Allamanis

Datasets & Analysis

Code duplicates in large code corpora have adverse effects on the evaluation and use of machine learning models that rely on them. Most existing corpora suffer from this problem to some extent. This dataset contains a "duplication" index for some of the existing corpora in Big Code research. The method for collecting this dataset is described in "The Adverse Effects of Code Duplication in Machine Learning Models of Code" by Allamanis [ArXiV, to appear in SPLASH 2019].

 

174 views
  • Computational Intelligence
  • Last Updated On: 
    Thu, 06/27/2019 - 11:47