Datasets
Standard Dataset
Text files used for Plagiarism Checker in TAES
- Citation Author(s):
- Submitted by:
- Deepratna Awale
- Last updated:
- Sat, 05/29/2021 - 05:03
- DOI:
- 10.21227/20gt-cg24
- Data Format:
- License:
- Categories:
Abstract
Simple text file obtained from manually scraping the web for the question "What is Machine Learning?".
The files contain the first paragraph/ page on the website's approach to answer the question. This data is not used for commercial purposes and is available to all.
This data is used in TAES to show how it can be used for plagiarism checking. The text files (*.txt) contain plain text and need no preprocessing to use. Simply read the file and assign the data to a string object.
The text files are archived in a .rar file so you will need an archive extractor like WinRar. The data is preprocessed so no preprocessing is required from user side. Simply load the text file and assign the data to a string object in your code.
Naming Convention: sitename.txt
where, sitename is the website that the data was scraped from on the question "What is Machine Learning?"
Directory Structure:
Root/Answers
Key
Answer-Key.txt
sitename.txt (8 files)