Four independently built Twitter JSON files containing between 40,000 - 50,000 tweets with a common Hashtag to be used with network discovery algorithms.

Categories:
114 Views

This dataset contains 14283 song lyrics along with Carnatic Raga and Genre classification. These songs are written by Annamacharya, Peda Tirumalacharya and China Tirumalacharya during 15th century. The dataset is completely in Telugu language in unicode format. The dataset has two target features "Genre" and "Ragam". This dataset was prepared based on content publicly available from TTD website (https://www.tirumala.org/AnnamacharyaSankeerthanas.aspx).

The fields are:

Instructions: 

Use any standard JSON library that supports unicode data.

Categories:
274 Views

Dataset is intended for studying how student programming styles and usage of IDE differs between students who plagiarise their homework and students who solve them honestly.Dataset includes homeworks submitted by students during two introductory programming courses (A and B) delivered during two years (2016 and 2017). A is delivered in C programming language, while B is delivered in C++. In addition to homeworks, dataset includes full traces of all student activity and keystrokes during homework development.

Instructions: 

The archive provided consists of three parts:SOURCE CODES:Actual submitted homeworks by students (i.e. their source codes) are stored in folder "src". Subfolders of this folder are named after courses: A2016, A2017, B2016 and B2017. This further contain subfolders for individual assignments. On each course students were required to solve 16-22 assignments labeled "Z1/Z1", "Z1/Z2", "Z2/Z1" etc. Finally, in each folder are actual C or C++ files named after student (anonymized, so actual student names were replaced by strings in form "student1393").TRACES:IDE usage traces are stored in folder named "stats". Again, this folder is organized into subfolders named after courses. These folders contain files named after student (anonymized) with extension .stats and are in JSON format. Format of JSON files is described in readme.txt file.GROUND TRUTH:Ground truth lists students and groups of students that are considered to have involved in plagiarism due to code similarity and failure to deliver an "oral defense". There are three ground truth files. ground-truth-anon.txt contains full list of plagiarisms, ground-truth-static-anon.txt only those based on source code similarity, and ground-truth-dynamic-anon.txt only those based on failure to do an "oral defense". There is some overlap between the last two files. The format of the file is: homework assignment in the format:- A2016/Z1/Z1(dash, space, course name, slash, assignment name), followed by lists of anonymized names of students (such as "student3241") or groups of students who are mutually plagiarised separated by comma.

Categories:
2534 Views

While social media has been proved as an exceptionally useful tool to interact with other people and massively and quickly spread helpful information, its great potential has been ill-intentionally leveraged as well to distort political elections and manipulate constituents. In the paper at hand, we analyzed the presence and behavior of social bots on Twitter in the context of the November 2019 Spanish general election.

Instructions: 

Data have been exported in three formats to provide the maximum flexibility:

  • MongoDB Dump BSONs
    • To import these data, please refer to the official MongoDB documentation.
  • JSON Exports
    • Both the users and the tweets collections have been exported as canonical JSON files. 
  • CSV Exports (only tweets)
    • The tweet collection has been exported as plain CSV file with comma separators.
Categories:
984 Views

In an infectious disease outbreak the identification of pathogen genome sequence variants provides epidemiologists with high-resolution transmission diagnostics that can help cluster patients; identify cohorts of individuals who need testing; and identify new variants that may compromise existing vaccines, therapeutics, and low-resolution detection diagnostics.  The Oxford Nanopore MinION™ is a uniquely portable nucleic acid sequencing device that has been used in limited-resource settings for this purpose, e.g., during the 2014-2016 outbreak of Ebolavirus (EBOV) disease in Africa.  We desc

Instructions: 

Multiple README files are found within the compressed archives in this dataset.  Most files are self-explanatory for biomedical research scientists who are familiar with the analysis of variants in nucleotide sequence data.

Categories:
618 Views

The dataset stores multi-carpark occupancy records in JSON format.

Categories:
129 Views

Bitcoin is a decentralized digital currency that has gained significant attention and growth in recent years. Unlike traditional currencies, Bitcoin does not rely on a centralized authority to control the supply, distribution, and verification of the validity of transactions. Instead, Bitcoin relies on a peer-to- peer network of volunteers to distribute pending transactions and confirmed blocks, verify transactions, and to collectively implement a replicated ledger that everyone agrees on. This peer-to-peer (P2P) network is at the heart of Bitcoin and many other blockchain technologies.

Categories:
601 Views

This dataset provides a labeled fake news data, which can be used to have a deep study of fake news.

Categories:
2205 Views

10,000 rows of sample data in four formats (CSV, JSON, XML, XLSX) for testing. Fields are as follows: seq, first, last, age, street, city, state, zip, dollar, pick, date, latitude, longitude, birthday, ccnumber, dollar, ccnumber, phone, email, sentence, paragraph

Categories:
1484 Views

This dataset contains the performance models, simulation and monitoring results, and analysis scripts that we used for our evaluation.

Categories:
99 Views

Pages