It contains the data of four omic profiles (CNV, mRNA, miRNA, and protein) obtained for BRCA, LGG, and LUAD obtained from the TCGA project. 

In addition, we provide synthetic data for a mixture of isotropic distributions.


Dataset used in the article "The Reverse Problem of Keystroke Dynamics: Guessing Typed Text with Keystroke Timings". CSV files with dataset results summaries, the evaluated sentences, detailed results, and scores. Results data contains training and evaluation ARFF files for each user, containing features of synthetic and legitimate samples as described in the article. The source data comes from three free text keystroke dynamics datasets used in previous studies, by the authors (LSIA) and two other unrelated groups (KM, and PROSODY, subdivided in GAY, GUN, and REVIEW).


These datasets collect sensorial information about collaborative robot functioning. We recorded information from two different kinds of robots UR3e and UR10e. This dataset is used for data-driving modeling of the power consumption of cobots. The datasets have the following information: recording time, trajectory ID, joints' positions, joints' velocities, motor currents, motor torques, motor voltages, end effector position, force and momentum exerted to the end effector, current and voltage of the robot.


Dataset used in the article "The Reverse Problem of Keystroke Dynamics: Guessing Typed Text with Keystroke Timings". Source data contains CSV files with dataset results summaries, false positives lists, the evaluated sentences, and their keystroke timings. Results data contains training and evaluation ARFF files for each user and sentence with the calculated Manhattan and euclidean distance, R metric, and the directionality index for each challenge instance.


Twitter is one of the most popular social networks for sentiment analysis. This data set of tweets are related to the stock market. We collected 943,672 tweets between April 9 and July 16, 2020, using the S&P 500 tag (#SPX500), the references to the top 25 companies in the S&P 500 index, and the Bloomberg tag (#stocks). 1,300 out of the 943,672 tweets were manually annotated in positive, neutral, or negative classes. A second independent annotator reviewed the manually annotated tweets.


Twitter RAW data was downloaded using the Twitter REST API search, namely the "Tweepy (version 3.8.0)" Python package, which was created to make the interaction between the REST API and the developers easier. The Twitter REST API only retrieves data from the past seven days and allows to filter tweets by language. The tweets retrieved were filtered out for the English (en) language. Data collection was performed from April 9 to July 16, 2020, using the following Twitter tags as search parameter: #SPX500, #SP500, SPX500, SP500, $SPX, #stocks, $MSFT, $AAPL, $AMZN, $FB, $BBRK.B, $GOOG, $JNJ, $JPM, $V, $PG, $MA, $INTC $UNH, $BAC, $T, $HD, $XOM, $DIS, $VZ, $KO, $MRK, $CMCSA, $CVX, $PEP, $PFE. Due to the large number of data retrieved in the RAW files, it was necessary to store only each tweet's content and creation date.


The file tweets_labelled_09042020_16072020.csv consists of 5,000 tweets selected using random sampling out of the 943,672 sampled. Out of those 5,000 tweets, 1,300 were manually annotated and reviewed by a second independent annotator. The file tweets_remaining_09042020_16072020.csv contains the remaining 938,672 tweets.


This dataset has been developed based on the work of the GeoCOV19Tweets Dataset. The original work by Lamsal, R. runs network analysis on a similar dataset to understand the underlying relationship between countries and hashtags. The work did an analysis on roughly 300k number of [country, hashtag] relations from 190 countries and territories, and 5055 unique hashtags. This work pushes the number of relationships by 3 times.


This dataset provides [place, hashtag] relationships in a Comma-separated values (CSV) file. Each line represents a relationship. You can simply use the CSV file as per your research needs.

However, if you need to change the place entity from city (currently the dataset uses ["place"]["name"] object) to country, you'll have to consider the ["place"]["country"] object instead. The sample script is provided with this dataset. The script takes in a list of tweet IDs present in a CSV file and hydrates the IDs to extract places and hashtags relationships. The script is written for twarc.


Dataset with diverse type of attacks in Programmable Logic Controllers:

1- Denial of Service 

  • Flooding
  • Amplification/Volumetric

2- Man in the Middle


The full documentation of the dataset is available at: 


The dataset if composed of several files regarding the DoS attacks and MiTM attacks.


A sample CSV file is also provided to illustrate the contents of the collected data. The majority of data is available at pcap format.


Full instructions are available at: 


LiDAR point cloud data serves as an machine vision alternative other than image. Its advantages when compared to image and video includes depth estimation and distance measruement. Low-density LiDAR point cloud data can be used to achieve navigation, obstacle detection and obstacle avoidance for mobile robots. autonomous vehicle and drones. In this metadata, we scanned over 1200 objects and classified it into 4 groups of object namely, human, cars, motorcyclist.


Automatic humor detection has interesting use cases in modern technologies, such as chatbots and virtual assistants. Existing humor detection datasets usually combined formal non-humorous texts and informal jokes with incompatible statistics (text length, words count, etc.). This makes it more likely to detect humor with simple analytical models and without understanding the underlying latent lingual features and structures.


This dataset contains road network information of Chengdu with travel time data during four time slots: weekday peak hour, weekday off-peak hour, weekend peak hour and weekend off-peak hour.