*.csv (zip); *.json (zip); *.pickle (zip); *.npz (zip);

In this paper we use Natural Language Processing techniques to improve different machine learning approaches (Support Vector Machines (SVM), Local SVM, Random Forests) to the problem of automatic keyphrases extraction from scientific papers. For the evaluation we propose a large and high-quality dataset: 2000 ACM papers from the Computer Science domain. We evaluate by comparison with expert-assigned keyphrases.

Categories:
23 Views

Dataset for "SynEL: A Synthetic Benchmark for Entity Linking" paper. The dataset integrates structured information from two primary sources: DBpedia for English, representing a high-resource language environment, and the Russian Public Company Register, a challenging low-resource dataset. Each dataset includes extensive annotations and structured entity links, ensuring high relevance for real-world applications in diverse industries.

Categories:
253 Views

Surface electromyography (EMG) can be used to interact with and control robots via intent recognition. However, most machine learning algorithms used to decode EMG signals have been trained on small datasets with limited subjects, impacting their generalization across different users and tasks. Here we developed EMGNet, a large-scale dataset for EMG neural decoding of human movements. EMGNet combines 7 open-source datasets with processed EMG signals for 132 healthy subjects (152 GB total size).

Categories:
984 Views

This dataset is the outcome of an observation on Millet traits under seed coating and covering. For covering we rely on Germination Percentage (FGP), Germination Index (GI),Mean Germination Time (MGT), Seedling Length( SL) and Seedling Vigour Index (SVI) and Abnormal Seedling have been measured.  Moreover, different enzyme levels including catalase, peroxidase, and  Malondialdehyde (MDA) are measured. 

Categories:
152 Views

This dataset comprises Internet core network data inferred using the methodology detailed in the article titled 'Exploring Internet Evolution Through Analysis of its Core Network'.

Categories:
120 Views

In this study, an equatorial telescope with an aperture of 310 mm, which will be installed in Antarctica in 2024, is chosen as the research subject. The Hour angle that the telescope pointing at is in the range of t[0, 360], and that for the declination axis is [-90, 30].The dataset contains around 3,000 images. The overall workflow is to collect images of the telescope in various poses and then collect two of each pose of the telescope from the TCS side of the telescope

Categories:
45 Views

QuaN is a collection of specially designed datasets for exploring the impact of noise quantum machine learning and other applications. The presented work focuses on the transformation of clean datasets into noisy counterparts across diverse domains, including MNIST-handwritten digits datasets, Medical MNIST, IRIS datasets and Mobile Health datasets. The dataset is created using noise from classical and quantum domains.

Categories:
456 Views

Simulated dataset for deriving parametric constraints for Bayesian Knowedge Tracing. The classical Expectation-Maximization method results in degenerate parameters (i.e., parameters that violate the conceptual interpretation of the model, such as by saying that a learner with no knowledge of a skill is more likely to get an answer correct than a learner with knowledge). A novel approach based on Newton's method rescues these paramters using mathematically derived constraints on the parameter space. 

 

Categories:
83 Views

This research studies the stance classification task of parliamentary debates with the aims to analyse how parliamentarians argue on different debate topic, what is their political stance, and the impact of homophily with respect to their party affiliation. A state-level Australian Hansard data is collected focusing on debates related to obesity and food marketing policies in Australia. It covers 6 states and 1 territory (NT is excluded) from the period 1/1/2000 to 1/1/ 2022.

Categories:
60 Views

Visual storytelling refers to the manner of describing a set of images rather than a single image, also known as multi-image captioning. Visual Storytelling Task (VST) takes a set of images as input and aims to generate a coherent story relevant to the input images. In this dataset, we bridge the gap and present a new dataset for expressive and coherent story creation. We present the Sequential Storytelling Image Dataset (SSID), consisting of open-source video frames accompanied by story-like annotations.

Categories:
1801 Views

Pages