figure 1


Distributed coordination function (DCF) is an important technique of medium access control (MAC), utilized by IEEE 802.11 wireless local area networks. 


802.11 Distributed Coordination Function (DCF) is a protocol which uses carrier sensing along with a four way handshake to maximize the throughput while preventing packet collisions. 



  • Drift types (A): gradual, incremental, recurring and sudden
  • Drift perspectives (B): time and trace
  • Noise percentage (C): 0, 5, 10, 15, 20
  • Number of cases in the stream (D): 100, 500, 1000
  • Change patterns (E): baseline, cb, cd, cf, cp, IOR, IRO, lp, OIR, pl, pm, re, RIO, ROI, rp, sw


The file name follows the pattern [A]_[B]_noise[C]_[D]_[E]

An identical version of this dataset in the MXML format is available at:


Dockerfile plays an important role in the Docker-based containerization process, but many Dockerfile codes are infected with smells in practice. This dataset contains a collection of 6,334 projects to help developers gain some insights into the occurrence of Dockerfile smells. Those projects belong to 10 popular programming languages, i.e., Shell, Makefile, Ruby, PHP, Python, Java, HTML, CSS, JavaScript, and Go. 


Network traffic analysis, i.e. the umbrella of procedures for distilling information from network traffic, represents the enabler for highly-valuable profiling information, other than being the workhorse for several key network management tasks. While it is currently being revolutionized in its nature by the rising share of traffic generated by mobile and hand-held devices, existing design solutions are mainly evaluated on private traffic traces, and only a few public datasets are available, thus clearly limiting repeatability and further advances on the topic.


MIRAGE-2019 is a human-generated dataset for mobile traffic analysis with associated ground-truth, having the goal of advancing the state-of-the-art in mobile app traffic analysis. MIRAGE-2019 takes into consideration the traffic generated by more than 280 experimenters using 40 mobile apps via 3 devices.

A sampled version of the dataset (one app per category) is readily downloadable, whereas the complete version is available on request.

APP LIST reports the details on the apps contained in the two versions of the dataset.

If you are using MIRAGE-2019 human-generated dataset for scientific papers, academic lectures, project reports, or technical documents, please help us increasing its impact by citing the following reference:

Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, Valerio Persico and Antonio Pescapè,"MIRAGE: Mobile-app Traffic Capture and Ground-truth Creation",4th IEEE International Conference on Computing, Communications and Security (ICCCS 2019), October 2019, Rome (Italy).



This folder contains two csv files and one .py file. One csv file contains NIST ground PV plant data imported from This csv file has 902 days raw data consisting PV plant POA irradiance, ambient temperature, Inverter DC current, DC voltage, AC current and AC voltage. Second csv file contains user created data. The Python file imports two csv files. The Python program executes four proposed corrupt data detection methods to detect corrupt data in NIST ground PV plant data.


Even though intelligent systems such as Siri or Google Assistant are enjoyable (and useful) dialog partners, users can only access predefined functionality. Enabling end-users to extend the functionality of intelligent systems will be the next big thing. To promote research in this area we carried out an empirical study on how laypersons teach robots new functions by means of natural language instructions. The result is a labeled corpus consisting of 3168 submissions given by 870 subjects.


The Corpus consist of three datasets

  1. The raw dataset of submissions (without labels): raw_dataset.csv
  2. The labeled dataset: labeled_dataset.csv
  3. Personal data of the participants as provided by Prolific (Caution: Information are incomplete since registered members provide it voluntarily): personal_infomation_prolific.csv

This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 90 million running words. The "Nepali Text Corpus" can be accessed freely from


from gensim.models import KeyedVectors

# Load vectors
model = KeyedVectors.load_word2vec_format(''.../path/to/nepali_embeddings_word2vec.txt', binary=False)

# find similarity between words

#most similar words

#try some linear algebra maths with Nepali words
model.most_similar(positive=['', ''], negative=[''], topn=1)


This data set comprises 4223 videos from a laser surface heat treatment process (also called laser heat treatment) applied to cylindrical workpieces made of steel. The purpose of the dataset is to detect anomalies in the laser heat treatment learning a model from a set of non-anomalous videos.In the laser heat treatment, the laser beam is following a pattern similar to an "eight" with a frequency of 100 Hz. This pattern is sometimes modified to avoid obstacles in the workpieces.The videos are recorded at a frequency of 1000 frames per second with a thermal camera.


See for details on the structure of the dataset.


This is the dataset for the manuscript entitled "Physics-prior Bayesian neural networks in semiconductor processing", IEEE Access