Smart speakers and voice-based virtual assistants are core components for the success of the IoT paradigm. Unfortunately, they are vulnerable to various privacy threats exploiting machine learning to analyze the generated encrypted traffic. To cope with that, deep adversarial learning approaches can be used to build black-box countermeasures altering the network traffic (e.g., via packet padding) and its statistical information.
This dataset contains several pcap files generated by the Google Home smart speaker placed under different conditions.
- Mic_on_off_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively.
- Mic_on_off_gquic_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively, excluding all network traffic not belonging to the google: gquic protocol.
- Mic_on_off_noise_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days.
- Mic_on_off_noise_gquic_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days. excluding all network traffic not belonging to the google protocol: gquic.
- media_pcap_anonymized contains several pcap files after the execution of queries such as "Whats' the latest news?" or "Play some music" (On each file has been stored network traffic collected after the execution of one query).
- travel_pcap_anonymized contains several pcap files after the execution of queries such as "How is the weather today?" (On each file has been stored network traffic collected after the execution of one query).
- utilities_pcap_anonymized contains several pcap files after the execution of queries such as "What's on my agenda today?" or "What time is it?" (On each file has been stored network traffic collected after the execution of one query).
This dataset is a sample of the dataset used for the paper "A network analysis on cloud gaming:Stadia, GeForce Now and PSNow" and rappresent samples of the gaming sessions.
To access further data, please contact Gianluca Perna at: email@example.com
Due to the large number of vulnerabilities in information systems and the continuous activity of attackers, techniques for malicious traffic detection are required to identify and protect against cyber-attacks. Therefore, it is important to intentionally operate a cyber environment to be invaded and compromised in order to allow security professionals to analyze the evolution of the various attacks and exploited vulnerabilities.
This dataset includes 2016, 2017 and 2018 cyber attacks in the HoneySELK environment.
PCAPs contain attacks targeting several honeypots configured with the following protocols/ports:
- SSH: 22/TCP
- HTTP: 80/TCP
- HTTPS: 443/TCP
- MYSQL: 3306/TCP
- FTP: 21/20/TCP
- DNS: 53/TCP/UDP
- NTP: 123/UDP
- TELNET: 23/TCP
- MSRPC: 135/TCP
- NETBIOS-SSN: 139/TCP
- MICROSOFT-DS: 445/TCP
Message Queuing Telemetry Transport (MQTT) protocol is one of the most used standards used in Internet of Things (IoT) machine to machine communication. The increase in the number of available IoT devices and used protocols reinforce the need for new and robust Intrusion Detection Systems (IDS). However, building IoT IDS requires the availability of datasets to process, train and evaluate these models. The dataset presented in this paper is the first to simulate an MQTT-based network. The dataset is generated using a simulated MQTT network architecture.
The dataset consists of 5 pcap files, namely, normal.pcap, sparta.pcap, scan_A.pcap, mqtt_bruteforce.pcap and scan_sU.pcap. Each file represents a recording of one scenario; normal operation, Sparta SSH brute-force, aggressive scan, MQTT brute-force and UDP scan respectively. The attack pcap files contain background normal operations. The attacker IP address is “192.168.2.5”. Basic packet features are extracted from the pcap files into CSV files with the same pcap file names. The features include flags, length, MQTT message parameters, etc. Later, unidirectional and bidirectional features are extracted. It is important to note that for the bidirectional flows, some features (pointed as *) have two values—one for forward flow and one for the backward flow. The two features are recorded and distinguished by a prefix “fwd_” for forward and “bwd_” for backward.
The dataset contains a set of voip flows (with different codecs) captured in fixed/fixed and fixed/mobile LTE-A enviroment.