The dataset links to the survey performed on students and professors of Biological Engineering introductory course, as the Department of Biological Engineering, University of the Republic, Uruguay.
The dataset is meant for pure academic and non-commerical use.
For queries, please consult the corresponding author (Parag Chatterjee, email@example.com).
This dataset contains nearly 1 Million unique movie reviews from 1150 different IMDb movies spread across 17 IMDb genres - Action, Adventure, Animation, Biography, Comedy, Crime, Drama, Fantasy, History, Horror, Music, Mystery, Romance, Sci-Fi, Sport, Thriller and War. The dataset also contains movie metadata such as date of release of the movie, run length, IMDb rating, movie rating (PG-13, R, etc), number of IMDb raters, and number of reviews per movie.
Movie details can be found by every genre file inside 1_movie_per_genre folder.
Reviews of every Movie can be found in 2_reviews_per_movie_raw folder.
Note that file name in 2nd folder equals movie name + year of release (found in first folder)
Vehicular networks have various characteristics that can be helpful in their inter-relations identifications. Considering that two vehicles are moving at a certain speed and distance, it is important to know about their communication capability. The vehicles can communicate within their communication range. However, given previous data of a road segment, our dataset can identify the compatibility time between two selected vehicles. The compatibility time is defined as the time two vehicles will be within the communication range of each other.
Each row contains characteristic information related to two vehicles at time t. Data set feature set (column headings) are as follows:
- Euclidean Distance: The shortest distance between two vehicles in meters
- Relative Velocity: The velocity of 2nd vehicles as seen from 1st vehicle
- Direction Difference: Given the direction information of each vehicle, the direction difference feature identifies the angle both vehicles are moving towards. For instance, two vehicles going on the same road can have direction difference 0, whereas two vehicles moving in the opposite direction will have a difference of 180. we calculated direction difference using: |((Direction of i - Direction of j+ 180)%360 - 180)| .
- Direction Difference Label: To ease the process for the supervised learning model, we also included direction difference label information by identifying three possible directions ( 0 if difference < 60, 2 if difference >120 and 1 if none of above)
- Tendency: The Tendency is an interesting label that is required to differentiate between two vehicles which are moving in opposite directions, but either they are approaching each other or moving away from each other.
Target Label (Compatibility time): Our goal is to identify how long two vehicles will be in the communication range of each other. The predicted compatibility time label tells us five possible values:
L0 means Compatibility Time is 0
L1 means Compatibility Time is more than 2 seconds but less than 5 seconds
L2 means Compatibility Time is more than 5 seconds but less than 10 seconds
L3 means Compatibility Time is more than 10 seconds but less than 15 seconds
L4 means Compatibility Time is more than 15 seconds
One-way delay (OWD) is the transmission time of the network packet from the first to the last bit from the sender node to the receiver node. The data set presented here was obtained as a result of measurements performed for the paper “Improving the Accuracy of One-Way Delay Measurements”.
One-way delay measurements were performed using three different utilities:
* the utility from the OWAMP protocol;
* first version of our utility, owping1; and
* the new version of our utility, owping2.
The graph shown in Figure 3 and the values in Table 2 are derived from data from files located in the Fig3andTab2 folder.
The OWAMP_chrony.csv file contains the results of measurements made on the local network: with, the IP packet size being 46 bytes, the measurement utility being OWAMP, and the type of NTP server being chrony. A file with the numerical OWAMP measurement data in microseconds can be seen via Excel.
The OWAMP_ntpd.csv file contains the results of measurements made on the local network: with, the size of the IP packet being 46 bytes, the measurement utility being OWAMP, and the type of NTP server being ntpd.
The owping2_chrony.csv file contains the results of measurements on the local network: with, the packet size being 46 bytes, the measuring utility being owping2, the NTP server type being chrony, and the protocol being UDP.
The owping2_ntpd.csv file contains the results of measurements on the local network: with, the packet IP size being 46 bytes, the measuring utility being owping2, the NTP server type being ntpd, and the protocol being UDP.
The graph displayed in Figure 5 and the values from Table 3 are derived from data from files located in the Fig5andTab3 folder.
All these files contain the results of measurements across a local network without a switch; the IP packet size is 46 bytes. The measurements in the files are presented in microseconds. They can be displayed via Excel.
In the owping1_icmp.csv file, the data is derived from owping1 utility measurements of ICMP packets.
In the owping1_udp.csv file, the data is derived from owping1 utility measurements of UDP packets.
In the owping2_icmp.csv file, the data is derived from owping2 utility measurements of ICMP packets.
In the owping2_udp.csv file, the data is derived from owping2 utility measurements of UDP packets.
The graph displayed in Figure 6 and the values in Table 4 are derived from data from a file located in the Fig6andTab4 folder.
The owamp_smr-crm_udp.csv file contains the OWD measurements across the global network, in the Samara-Crimea direction, using the OWAMP measurement utility.
Column A – represents the measurements made when the server was located in Crimea.
Column B – represents the measurements made when the server was located in Samara.
Table 5 was built using data from files located in the Tab5 folder.
The ping.csv file contains the results of RTT measurements across the global network, in the Samara-Crimea direction, using the RIPE Atlas measuring system.
The file 1 Client in Crimea.csv contains the results of OWD measurements across the Samara-Crimea section: with IP packet size being 46 bytes, and the measurement utility being owping2. The first column represents the measurements relating to the route from Samara to Crimea, the second represents the measurements relating to the route from Crimea to Samara. The values are in milliseconds. The file can be displayed using Excel.
File 2 Client in Crymea.csv contains the results of OWD measurements across the Crimea-Samara section: with, the IP packet size being 46 bytes, and the measurement utility being owping2. The first column represents the measurements relating to the route from Crimea to Samara, the second represents the measurements relating to the route from Samara to Crimea.
The graph displayed in Figure 7 was constructed using data from a file located in the Fig5 folder.
The owping2-owamp.csv file contains the OWD measurements for the Crimea-Samara direction. Column A contains data measured with owping2, Column B contains data measured with OWAMP.
The values shown in Table 6 were obtained using data from files located in the Tab6 folder.
OWAMP.csv contains the results of measurements across a global network in the Crimea-Samara direction (client in Crimea), where the IP packet size is 1500 bytes, and the measurement utility is OWAMP.
Column A - OWD from Crimea to Samara.
Column B - OWD from Samara to Crimea.
owping2.csv contains the results of measurements across a global network in the Crimean-Samara direction (client in Crimea), where the IP packet size is 1500 bytes, the measurement utility is owping2, and the protocol is UDP.
Column A - OWD from Crimea to Samara.
Column B - OWD from Samara to Crimea.
In addition to the data for the present paper, this set includes several additional files located in the Add folder.
The Rostov-Samara.csv file contains the results of OWD measurements from Rostov in the Don to Samara direction. Column A contains data for the Rostov-Samara direction, measured with owping2. Column B contains data for the return direction, Samara-Rostov.
The Rostov-Moscow.csv file contains the results of OWD measurements at Rostov in the Don to Moscow direction. Column A contains data for the Rostov-Moscow direction, measured with owping2. Column B contains data for the return direction, Moscow-Rostov.
The Rostov-Crimea.csv file contains the results of OWD measurements at Rostov in the Don-Crimea direction. Column A contains data for the Rostov-Crimea direction, measured with owping2. Column B contains data for the return direction Crimea-Rostov.
This dataset contains the results of the simulation runs of the experiments performed to evaluate and compare the proposed spatial model for situated multi-agent systems. The model was introduced in a paper entitled "BioMASS, a spatial model for situated multiagent systems that optimizes neighborhood search". In this paper we presented a new model to implement a spatially explicit environment that supports constant-time sensory (neighborhood search) and locomotion functions for situated multiagent systems.
The dataset include a compressed file in zip format. It contains a directory structure as shown below. Each directory is a specific experiment with each simulation toolkit and parameters. Inside each directory there are 50 CSV files, one for echa simulation run. Each file has a header describing the main parameters of the corresponding experiment. We use the Repast Toolkit, and Mason Toolkit to perform a benchmark with the proposed BioMASS spatial model.
This is a dataset of Finite Difference Time Domain (FDTD) simulation results of 13 defective crystals and one non-defective crystal. There are 4 fields in the dataset, namely: Real, Img, Int, and Attribute. The header real shows a real part of the simulated result, img shows the imaginary part, int gives the intensity all in superimposed form. Attribute denotes the label of a crystal simulated. The label 0 is for the simulated crystal, which is non-defective. Other 13 labels, from crystal 1 to crystal 13 are assigned to the 13 different crystals whose simulations are studied.
Read the abstract.
Modern science is build on systematic experimentation and observation. The reproducibility and replicability of the experiments and observations are central to science. However, reproducibility and replicability are not always guaranteed, sometimes referred to as 'crisis of reproducibility'. To analyze the extent of the crisis, we conducted a survey on the state of reproducibility in remote sensing. This survey was conducted as an online survey. The answers of the respondents are saved in this dataset in full-text CSV format.
The file contains the answers to our online survey on reproducibility in remote sensing. The format is as comma-separated values (CSV) in full-text, i.e. the answers are saved in the full-text instead of numbers, allowing to easily understand and analyse.
The dataset also includes the report given from the website the survey was hosted on (kwiksurveys.com). This can be used for a quick overview of the results, but also to see the original quesetions and the possible answers.
The dataset consists of the following columns:
ColumnDescriptiongift_idUnique ID of giftgift_typeType of gift (clothes/perfumes/etc.)gift_categoryCategory to which the gift belongs under that gift typegift_clusterType of industry the gift belongsinstock_dateDate of arrival of stockstock_update_dateDate on which the stock was updatedlsg_1 - lsg_6Anonymized variables related to giftuk_date1, uk_date2Buyer related datesis_discountedShows whether the discounted is applicable on the giftvolumesNumber of packages boughtpriceThe total price
This dataset contains IDs and sentiment scores of the geo-tagged tweets related to the COVID-19 pandemic. The tweets are captured by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. Complying with Twitter's content redistribution policy, only the tweet IDs are shared. You can re-construct the dataset by hydrating these IDs.
Each CSV file contains a list of tweet IDs. You can use these tweet IDs to download fresh data from Twitter (hydrating the tweet IDs). To make it easy for the NLP researchers to get access to the sentiment analysis of each collected tweet, the sentiment score computed by TextBlob has been appended as the second column. To hydrate the tweet IDs, you can use applications such as Hydrator (available for OS X, Windows and Linux) or twarc (python library) or QCRI's Tweets Downloader (java based).
Getting the CSV files of this dataset ready for hydrating the tweet IDs:
import pandas as pd
dataframe.to_csv("ready_april28_april29.csv", index=False, header=None)
The above example code takes in the original CSV file (i.e., april28_april29.csv) from this dataset and exports just the tweet ID column to a new CSV file (i.e., ready_april28_april29.csv). The newly created CSV file can now be consumed by the Hydrator application for hydrating the tweet IDs. To export the tweet ID column into a TXT file, just replace ".csv" with ".txt" in the to_csv function (last line) of the above example code.
If you are not comfortable with Python and pandas, you can upload these CSV files to your Google Drive and use Google Sheets to delete the second column. Once finished with the deletion, download the edited CSV files: File > Download > Comma-separated values (.csv, current sheet). These downloaded CSV files are now ready to be used with the Hydrator app for hydrating the tweets IDs.
While social media has been proved as an exceptionally useful tool to interact with other people and massively and quickly spread helpful information, its great potential has been ill-intentionally leveraged as well to distort political elections and manipulate constituents. In the paper at hand, we analyzed the presence and behavior of social bots on Twitter in the context of the November 2019 Spanish general election.
Data have been exported in three formats to provide the maximum flexibility:
- MongoDB Dump BSONs
- To import these data, please refer to the official MongoDB documentation.
- JSON Exports
- Both the users and the tweets collections have been exported as canonical JSON files.
- CSV Exports (only tweets)
- The tweet collection has been exported as plain CSV file with comma separators.