Text files used for Plagiarism Checker in TAES

Citation Author(s):
Deepratna
AWale
Submitted by:
Deepratna Awale
Last updated:
Sat, 05/29/2021 - 05:03
DOI:
10.21227/20gt-cg24
Data Format:
License:
701 Views
Categories:
0
0 ratings - Please login to submit your rating.

Abstract 

Simple text file obtained from manually scraping the web for the question "What is Machine Learning?".

The files contain the first paragraph/ page on the website's approach to answer the question. This data is not used for commercial purposes and is available to all.

This data is used in TAES to show how it can be used for plagiarism checking. The text files (*.txt) contain plain text and need no preprocessing to use. Simply read the file and assign the data to a string object. 

Instructions: 

The text files are archived in a .rar file so you will need an archive extractor like WinRar. The data is preprocessed so no preprocessing is required from user side. Simply load the text file and assign the data to a string object in your code.

Naming Convention: sitename.txt

where, sitename is the website that the data was scraped from on the question "What is Machine Learning?"

Directory Structure:

Root/Answers

Key

Answer-Key.txt

sitename.txt (8 files)