IEEE DataPort’s Spring 2020 Dataset Upload Competition Entries
All contest entries will be evaluated and must meet all contest rules in order to be eligible for prizes.
Most text-simplification systems require an indicator of the complexity of the words. The prevalent approaches to word difficulty prediction are based on manual feature engineering. Using deep learning based models are largely left unexplored due to their comparatively poor performance. We have explored the use of one of such in predicting the difficulty of words. We have treated the problem as a binary classification problem. We have trained traditional machine learning models and evaluated their performance on the task.
- Categories:
This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.
- Categories:
Annotated image dataset of household objects from the RoboFEI@Home team
This data set contains two sets of pictures of household objects, created by the RoboFEI@Home team to develop object detection systems for a domestic robot.
The first data set was created with objects from a local supermarket. Product brands are typical from Brazil. The second data set is composed of objects from the RoboCup@Home 2018 OPL competition.
- Categories:
Normalized, flattened axial particle displacement fields of surface acoustic wave (SAW) propagating in multi-layered agar phantoms (three two-layer agar phantom and one three-layer agar phantom) used to generate Fig. 2 in Zhou's study "A Weighted Average Phase Velocity Inversion Model for Depth-Resolved Elasticity Evaluation in Human Skin In-Vivo".
- Categories:
Representative, normalized and flattened axial particle displacement fields of surface acoustic wave (SAW) propagating in in-vivo human skin at different sites used to generate Fig. 5 in Zhou's study "A Weighted Average Phase Velocity Inversion Model for Depth-Resolved Elasticity Evaluation in Human Skin In-Vivo".
- Categories:
Optical Character Recognition (OCR) system is used to convert the document images, either printed or handwritten, into its electronic counterpart. But dealing with handwritten texts is much more challenging than printed ones due to erratic writing style of the individuals. Problem becomes more severe when the input image is doctor's prescription. Before feeding such image to the OCR engine, the classification of printed and handwritten texts is a necessity as doctor's prescription contains both handwritten and printed texts which are to be processed separately.
- Categories:
The dataset is composed of digital signals obtained from a capacitive sensor electrodes that are immersed in water or in oil. Each signal, stored in one row, is composed of 10 consecutive intensity values and a label in the last column. The label is +1 for a water-immersed sensor electrode and -1 for an oil-immersed sensor electrode. This dataset should be used to train a classifier to infer the type of material in which an electrode is immersed in (water or oil), given a sample signal composed of 10 consecutive values.
- Categories:
The dataset has Gaussian Blobs of varying samples, centers and features. The number of samples ranges from 500 to 50,000. Similarly, the number of centers varies from 2 to 100, while the number of features varies from 2 to 2048. These different sets of Gaussian blobs can be used for testing clustering algorithms for their scalability and effectiveness. There are two kinds of files inside the compressed sets. Files ending with "_X.csv" consist of datapoints, while the files ending with "_y.csv" represent respective class data.
- Categories:
This dataset contains python / jupyter notebooks with thermal magnitude data and loss simulations from IGBT-based converters. It also has implemented the proposed model and the results of the validations that compare the model with the simulations of three IGBT manufacturers.
- Categories:
This India-specific COVID-19 tweets dataset has been curated using the large-scale Coronavirus (COVID-19) Tweets Dataset. This dataset contains tweets originating from India during the first week of each of the four phases of nationwide lockdowns initiated by the Government of India. For more information on filtering keywords, please visit the primary dataset page.
Announcements:
- Categories:
The dataset provides Abilify Oral user reviews and ratings for drug’s satisfaction, effectiveness, and ease of use on different age groups.
- Categories:
EmoSurv is a dataset containing keystroke data along with emotion labels. Timing and frequency data is recorded while participants are typing free and fixed texts before and after being induced specific emotions. These emotions are: Anger, Happiness, Calmness, Sadness, and Neutral state.
First, data is collected while the participant is in a neutral state. Then, the participant watches an eliciting video. Once the emotion is induced in the participant, he types another fixed and free text.
- Categories:
This dataset contains facial expressions from different sides. the top-level videos are shot on Logitech c270 and the bottom ones are shot with an LG g6. The videos are continuous shots at 480p from different angles.
This is meant to serve as a dataset for facial expression recognition under different angles and poses.
- Categories:
Database for a technological acceptance research
- Categories:
dateset of Research on Optimization for LogGP Data Transmission Evaluation Model
- Categories:
dateset of Research on Optimization for LogGP Data Transmission Evaluation Model
- Categories:
This is the five mainstream stock market indices dataset. It includes XJO, DJI, IXIC, HSI, and N225 indices from Sep. 2010 ~ Aug. 2020.
- Categories:
This dataset is a set of eighteen directed networks that represents message exchanges among Twitter accounts during eighteen crisis events. The dataset comprises 645,339 anonymized unique user IDs and 1,396,709 edges that are labeled with respect to Plutchik's basic emotions (anger, fear, sadness, disgust, joy, trust, anticipation, and surprise) or "neutral" (if a tweet conveys no emotion).
- Categories:
The current maturity of autonomous underwater vehicles (AUVs) has made their deployment practical and cost-effective, such that many scientific, industrial and military applications now include AUV operations. However, the logistical difficulties and high costs of operating at-sea are still critical limiting factors in further technology development, the benchmarking of new techniques and the reproducibility of research results. To overcome this problem, we present a freely available dataset suitable to test control, navigation, sensor processing algorithms and others tasks.
- Categories:
The simulated InSAR building dataset contains 312 simulated SAR image pairs generated from 39 different building models. Each building model is simulated at 8 viewing-angles. The sample number is 216 of the train set and is 96 of the test set. Each simulated InSAR sample contains three channels: master SAR image, slave SAR image, and interferometric phase image. This dataset serves the CVCMFF Net for building semantic segmentation of InSAR images.
- Categories:
We chose 8 publicly available CT volumes of COVID-19 positive patients which were available from https://doi.org/10.5281/zenodo.3757476 and used 3D slicer to generate volumetric annotations of 512*512 dimension for 5 lung lobes namely right upper lobe, right middle lobe, right lower lobe, left upper lobe and left lower lobe. These annotations are validated by a radiologist with over 15 years of experience.
- Categories:
The dataset is genrated by the fusion of three publicly available datasets: COVID-19 cxr image (https://github.com/ieee8023/covid-chestxray-dataset), Radiological Society of North America (RSNA) (https://www.kaggle.com/c/rsna-pneumonia-detection-challenge), and U.S. national library of medicine (USNLM) collected Montgomery country - NLM(MC) (http
- Categories:
See our next papers.
- Categories:
This dataset is created with the usage of Galvanic Skin Response Sensor and Electrocardiogram sensor of MySignals Healthcare Toolkit. MySignals toolkit consists of the Arduino Uno board and different sensor ports. The sensors were connected to the different ports of the hardware kit which was controlled by Arduino SDK.
- Categories:
See our next articles.
- Categories:
See our next articles.
- Categories:
"The friction ridge pattern is a 3D structure which, in its natural state, is not deformed by contact with a surface''. Building upon this rather trivial observation, the present work constitutes a first solid step towards a paradigm shift in fingerprint recognition from its very foundations. We explore and evaluate the feasibility to move from current technology operating on 2D images of elastically deformed impressions of the ridge pattern, to a new generation of systems based on full-3D models of the natural nondeformed ridge pattern itself.
- Categories:
There are two datasets: Drebin4000 and AMD6000.
- Categories:
GesHome dataset consists of 18 hand gestures from 20 non-professional subjects with various ages and occupation. The participant performed 50 times for each gesture in 5 days. Thus, GesHome consists of 18000 gesture samples in total. Using embedded accelerometer and gyroscope, we take 3-axial linear acceleration and 3-axial angular velocity with frequency equals to 25Hz. The experiments have been video-recorded to label the data manually using ELan tool.
- Categories:
When producing bolts in a cold forging process, the pressure signals are recorded per cycle of forming a bolt. The dataset is collected from experiments of different failure modes of a forming machine. Two experiments were recorded in csv format for providing four failure modes, including core broken, cavity block, insufficient lubrication, and material out-of-specification, as well as one normal mode. The two experiments were performed in the same machine with different cavities and cores, and saved in Experimental Data for Modeling and Testing.
- Categories:
This dataset was collected from force, current, angle (magnetic rotary encoder), and inertial sensors of the NAO humanoid robot while walking on Vinyl, Gravel, Wood, Concrete, Artificial grass, and Asphalt without a slope and while walking on Vinyl, Gravel, and Wood with a slope of 2 degrees. In total, counting all different axes and components of each sensor, we monitored 27 parameters on-board of the robot.
- Categories:
This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:
- Categories:
This dataset is used to develop an algorithm for automatic segmenting the collected signals. When machining a workpiece in a milling process, vibration signals can be recorded by a 3-axis accelerometer, which is attached on the spindle of a CNC milling machine. To segment the recorded signals, a moving window (0.5 sec) is applied to sample the vibration signals and manually labeled the corresponding modes, i.e. dry run or milling, of each window. To verify the algorithm, 3 types of operations are provided and recorded in csv format.
- Categories:
Recently, Temporal Information Retrieval (TIR) has grabbed the major attention of the information retrieval community. TIR exploits the temporal dynamics in the information retrieval process and harnesses both textual relevance and temporal relevance to fulfill the temporal information requirements of a user Ur Rehman Khan et al., 2018. The focus time of document is an important temporal aspect which is defined as the time to which the content of the document refers Jatowt et al., 2015; Jatowt et al., 2013; Morbidoni et al., 2018, Khan et al., 2018.
- Categories:
This dataset is used to develop an algorithm for evaluating machining quality. When machining a workpiece in a milling process, vibration signals can be recorded by a 3-axis accelerometer, which is attached on the spindle of a CNC milling machine. To evaluate machining quality, the vibration signals can be segmented and extracted the corresponding features, in the time, frequency, and time-frequency domains. After serving with the features, a model can be developed to estimate the machining quality, such as the roughness of a workpiece.
- Categories:
Data for the study has been retrieved from a publicly available data set of a leading European P2P lending platform, Bondora (https://www.bondora.com/en). The retrieved data is a pool of both defaulted and non-defaulted loans from the time period between 1st March 2009 and 27th January 2020. The data comprises demographic and financial information of borrowers and loan transactions. In P2P lending, loans are typically uncollateralized and lenders seek higher returns as compensation for the financial risk they take.
- Categories:
Pages
- 1408 reads