This dataset is offered as .csv and is part of 3 files which are:

- File 1: has all 1699 arabic news headlines colllected with the corresponding emotion classification that 3 annotators agreed on with no bias

- File 2: has the dataset with BOW features extracted

- File 3: has the dataset with n-gram features extracted


This data set is related to the figures from the paper titled "Nanofabricated Low-Voltage Gated Si Field Ionization Arrays" by Girish Rughoobur, Alvaro Sahagun, Olusoji Ilori and Akintunde I. Akinwande.


This dataset provides bibliography on literature anaylzed in the course of a systematic literature review to derive quality factors relevant for simulation modelling in infromation systems research. The dataset connects to the following publication: M. Auf der Landwehr, M. Trott, and C. von Viebahn, "Computer Simulation as Evaluation Tool of Information Systems: Identifying Quality Factors of Simulation Modeling," in IEEE CBI 2020: 22nd International Conference on Business Informatics: University of Antwerp in Antwerp, Belgium, June 22–24, 2020



The dataset consists of reviews for various hotels throughout the world and data columns range from Location, Trip Type to various parameters of reviewing with individual review score. The data can be preprocessed and used for various purposes ranging from review categorization, topic extraction, sentiment analysis, location based quality calculation etc. Trustworthy real world data comes handy now-a-days and is tough to get a grasp on. So this dataset will be a good contribution for the researcher community as well as professionals. 






The .zip file contains 6 folders when unzipped. We provide the details of each folder below.


“Proteins” folder: Contains 20 protein targets organized into two folders (Benchmark and CASP) depending on the family each target belongs to. Data for each protein is provided in a subfolder named with its id. Each such subfolder contains the following 4 files.

  1. A .fasta file containing the amino-acid sequence of the protein.

  2. A .pdb file containing the native tertiary structure coordinates. Detailed format for a .pdb file can be found in

  3. A .frag3 file containing the fragments of length 3 for the protein sequence generated from

  4. A .frag9 file containing the fragments of length 9 for the protein sequence generated from


“Generation” folder: Contains the generated ensembles for the protein targets in 20 subfolders, one for each target, named with their ids. Each subfolder contains 5 files, each containing the generated ensemble for one run. Each such file contains 14 columns and each row represents one generated structure. The first column provides the Rosetta score4 energy, the second column provides the lRMSD to the native structure, and each of the rest of the 12 columns provides one USR feature for the structure.


“Reduced” folder: Contains the reduced ensembles for each clustering technique in separate folders. Each such folder contains 20 subfolders, one for each target, named with their ids. Each such subfolder contains 5 files, each containing the reduced ensemble for one run. Each such file contains 2 columns and each row represents one structure in the reduced ensemble. The first column provides the Rosetta score4 energy and the second column provides the lRMSD to the native structure.


“Truncation” folder: Contains the reduced ensembles via truncation for the protein targets in 20 subfolders, one for each target, named with their ids. Each such subfolder contains 5 files, each containing the reduced ensemble for one run. Each such file contains 2 columns and each row represents one structure in the reduced ensemble. The first column provides the Rosetta score4 energy and the second column provides the lRMSD to the native structure.


“Ks” folder: Contains 4 separate files, one for each clustering technique, containing the number of clusters for each run of each protein target. These files can be used to plot the distributions for the number of clusters.


“Bars” folder: Contains 3 separate subfolders containing the information needed to plot the bar charts for the minimum, average, and standard deviation of lRMSDs to the native structure for the CASP targets. Each subfolder contains 10 files, one for each target. Each file contains 6 rows that provide the lRMSD value for original ensemble, reduced ensemble for hierarchical clustering, reduced ensemble for k-means clustering, reduced ensemble for GMM clustering, reduced ensemble for gmx-cluster clustering, and reduced ensemble for truncation, respectively.


Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016.


The dataset comes from Wine Spectator Bordeaux wine reviews in human language format from year 2000 to year 2016. A total of 14,349 wines have been collected. There are 4263 above score 90/100 wines and 10,086 below score 89/100 wines. Detailed information is available in the paper. The dataset was processed by the Computational Wine Wheel to become the uploaded dataset. The first attribute of the dataset is the name of the wine. The second attribute of the dataset is the vintage of the wine. The third attribute of the dataset is the score given by the Wine Spectator of the wine. The fourth attribute of the dataset is the price of the wine. $NA indicates the wine price was not available during the time of the wine being reviewed. The rest of the attributes are the characteristic describing the wine with true/false value.


For Publications, please cite the following papers:

Dong, Zeqing, Xiaowan Guo, Syamala Rajana, and Bernard Chen. "Understanding 21st Century Bordeaux Wines from Wine Reviews Using Naïve Bayes Classifier." Beverages 6, no. 1 (2020): 5.

Chen, Bernard, Christopher Rhodes, Aaron Crawford, and Lorri Hambuchen. "Wineinformatics: applying data mining on wine sensory reviews processed by the computational wine wheel." In 2014 IEEE International Conference on Data Mining Workshop, pp. 142-149. IEEE, 2014.

Chen, Bernard, Christopher Rhodes, Alexander Yu, and Valentin Velchev. "The Computational Wine Wheel 2.0 and the TriMax Triclustering in Wineinformatics." In Industrial Conference on Data Mining, pp. 223-238. Springer, Cham, 2016.


This RSSI Dataset is a comprehensive set of Received Signal Strength Indicator (RSSI) readings gathered from three different types of scenarios. Three wireless technologies were used which consisted of:

  • Zigbee (IEEE 802.15.4),
  • Bluetooth Low Energy (BLE), and
  • WiFi (IEEE 802.11n 2.4GHz band).

The scenarios took place in three rooms with different sizes and inteference levels. For the experimentation, the equipment utilized consisted of Raspberry Pi 3 Model Bs, Gimbal Series 10 Beacons, and Series 2 Xbees with Arduino Uno microcontrollers.



A set of tests was conducted to determine the accuracy between multiple types of system designs including: Trilateration, Fingerprinting with K-Nearest Neighbor (KNN) processing, and Naive Bayes processing while using a running average filter. For the experiments, all tests were done on tables which allowed tests to be simulated at a height where a user would be carrying a device in their pocket. Devices were also kept in the same orientation throughout all the tests in order to reduce the amount of error that would occur in the measuring of RSSI values.


Three different experimental scenarios were utilized with varying conditions in order to determine how the proposed system will function according to the environmental parameters.

Scenario 1 was a 6.0 x 5.5 m wide meeting room. The environmental area was cleared of all transmitting devices to create a clear testing medium where all the devices can transmit without interference. Transmitters were placed 4 m apart from one another in the shape of a triangle. Fingerprint points were taken with a 0.5 m spacing in the center between the transmitters. This created 49 fingerprints that would comprise the database. For testing, 10 points were randomly selected.

Scenario 2 was a 5.8 x 5.3 m meeting room. This area was a high noise environment as additional transmitting devices were placed around the environment in order to create interference in the signals. There were 16 fingerprints gathered with a larger distance selected between the points. In this Scenario, 6 testing points were randomly selected to be used for comparing the algorithms.

Scenario 3 was a 10.8 x 7.3 m computer lab. This lab was a large area with a typical amount of noise occurring due to the WiFi and BLE transmitting that were in the area. The large space also allowed for signals to experience obstructions, reflections, and interference. Transmitters were placed so Line-of-Sight (LoS) was available between the transmitters to the receiver. In total, 40 fingerprints were gathered with an alternating pattern occurring between the points. Points were taken to be 1.2 m apart in one direction, and 0.6 m apart in the other. For testing 16 randomly selected points were taken.


In the testing environment, fingerprints were gathered to be used in the creation of a database, while test points were selected to be used against the database for the comparison. The figures of each topology can be found inside the dataset folder. In the figures, the black dots represent the location of the transmitters and the red dots represent the locations where fingerprints and test points were gathered where appropriate. 

Related Publication

S. Sadowski, P. Spachos, K. Plataniotis, "Memoryless Techniques and Wireless Technologies for Indoor Localization with the Internet of Things", IEEE Internet of Things Journal.


The RSSI dataset contains a folder for each experimental scenario and furthermore on wireless technology (i.e. Zigbee, BLE, and WiFi). Each folder contains three additional folders where the data was gathered (Pathloss, Database, and Tests). Pathloss contains 18 files measuring the RSSI at varying distances from the devices. The number of files located in Database and Tests varies based on the scenario.

For each technology, the file name corresponds to the point as to where the data was gathered. For specific locations, the (x,y) coordinates can be seen in the appropriate .xlsx file.

For the files in the Database and Tests folders, there are approximately 300 reading. In the Pathloss folder, there are approximately 50 only occurring from a single node. Readings appear in the format "Node LetterValue" where:

Letter corresponds to the transmitter that signal was sent from, represented by 'A', 'B', or 'C'.

Value is the RSSI reading.


We investigate the naïve idea of adults relating to two opposite processes shown by means of iconic stimuli. 



The Pinched Hysterisis Loop (PHL) i.e VI characteristics of memristor is studied by the application of different windows functions by computation in MATLAB environment. Also the behavior studied with different frequencies.


The data is about the aerodynamic CFD data for a kind of STT missile.