This is a dataset consisting of 8 features extracted from 70,000 monochromatic still images adapted from the Genome Project Standford's database, that are labeled in two classes: LSB steganography (1) and without LSB Steganography (0). These features are Kurtosis, Skewness, Standard Deviation, Range, Median, Geometric Mean, Hjorth Mobility, and Hjorth Complexity, all extracted from the histograms of the still images, including random spatial transformations. The steganographic function embeds five types of payloads, from 0.1 to 0.5.

Instructions: 

This is a dataset consisting of 8 features extracted from 70,000 monochromatic still images adapted from the Genome Project Standford's database, that are labeled in two classes: with (1) and without (0) LSB Steganography. In the training and testing dataset, it will be found 8 columns with the following features represented as numeric quantities: Kurtosis, Skewness, Standard Deviation, Range, Median, Geometric Mean, Hjorth Mobility, and Hjorth Complexity. There is a ninth column that expresses the class of the observation, being 0 as non-steganogram and 1 as steganogram. All the features were extracted from the histograms of the still images. Reading and processing of the dataset can be done using Pandas in Python, R or Matlab.

 

The steganographic function embeds five types of payloads, from 0.1 to 0.5. The training dataset includes 56,000 of these pairs of labeled images (with and without LSB Steganography), with which 5,600 images conform the dataset for each payload. The testing dataset has 14,000 observations and is equally divided as the training dataset.

Categories:
1024 Views

This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Top-1000 imported functions extracted from the 'pe_imports' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.

Instructions: 

* FEATURES *

Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string

Column name: GetProcAddress
Description: Most imported function (1st)
Type: 0 (Not imported) or 1 (Imported)

...

Column name: LookupAccountSidW
Description: Least imported function (1000th)
Type: 0 (Not imported) or 1 (Imported)

Column name: malware
Description: Class
Type: 0 (Goodware) or 1 (Malware)

* ACKNOWLEDGMENTS *

We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.

* CITATIONS *

Please refer to the dataset DOI.
Please feel free to contact me for any further information.

Categories:
4322 Views

This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Raw PE byte stream rescaled to a 32 x 32 greyscale image using the Nearest Neighbor Interpolation algorithm and then flattened to a 1024 bytes vector. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.

Instructions: 

* FEATURES *

Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string

Column name: pix_0
Description: The first greyscale pixel value
Type: Integer (0-255)

Column name: pix_1023
Description: The last greyscale pixel value
Type: Integer (0-255)

Column name: malware
Description: Class
Type: 0 (Goodware) or 1 (Malware)

* ACKNOWLEDGMENTS *

We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.

* CITATIONS *

Please refer to the dataset DOI.
Please feel free to contact me for any further information.

Categories:
1706 Views

This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data (PE Section Headers of the .text, .code and CODE sections) extracted from the 'pe_sections' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.

Instructions: 

* FEATURES *

Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string

Column name: size_of_data
Description: The size of the section on disk
Type: Integer

Column name: virtual_address
Description: Memory address of the first byte of the section relative to the image base
Type: Integer

Column name: entropy
Description: Calculated entropy of the section
Type: Float

Column name: virtual_size
Description: The size of the section when loaded into memory
Type: Integer

Column name: malware
Description: Class
Type: 0 (Goodware) or 1 (Malware)

* ACKNOWLEDGMENTS *

We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.

* CITATIONS *

Please refer to the dataset DOI.
Please feel free to contact me for any further information.

Categories:
1912 Views

ASNM datasets include records consisting of many features, that express various properties and characteristics of TCP communications. These features are called Advanced Security Network Metrics (ASNM) and were designed with the intention to discern legitimate and malicious connections (especially intrusions).

Instructions: 

ASNM datasets were created one by one during our long-term research. The following listing contains references to descriptions of particular datasets with their download locations:

 

  • ASNM-NPBO Dataset - contains non-payload-based obfuscation techniques applied onto malicious and some of legitimate traffic. It was created in 2015.
  • ASNM-TUN Dataset - contains tunneling obfuscation techniques applied to malicious traffic. It was created in 2014.
  • ASNM-CDX-2009 Dataset - contains ASNM features extracted from tcpdumps of CDX 2009 dataset. It misses few newer ASNM features. It was created in 2013.
Categories:
3881 Views

This dataset is part of our research on malware detection and classification using Deep Learning. It contains 42,797 malware API call sequences and 1,079 goodware API call sequences. Each API call sequence is composed of the first 100 non-repeated consecutive API calls associated with the parent process, extracted from the 'calls' elements of Cuckoo Sandbox reports.

Instructions: 

* FEATURES *

Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string

Column name: t_0 ... t_99
Description: API call
Type: Integer (0-306)

Column name: malware
Description: Class
Type: Integer: 0 (Goodware) or 1 (Malware)

API Calls: ['NtOpenThread', 'ExitWindowsEx', 'FindResourceW', 'CryptExportKey', 'CreateRemoteThreadEx', 'MessageBoxTimeoutW', 'InternetCrackUrlW', 'StartServiceW', 'GetFileSize', 'GetVolumeNameForVolumeMountPointW', 'GetFileInformationByHandle', 'CryptAcquireContextW', 'RtlDecompressBuffer', 'SetWindowsHookExA', 'RegSetValueExW', 'LookupAccountSidW', 'SetUnhandledExceptionFilter', 'InternetConnectA', 'GetComputerNameW', 'RegEnumValueA', 'NtOpenFile', 'NtSaveKeyEx', 'HttpOpenRequestA', 'recv', 'GetFileSizeEx', 'LoadStringW', 'SetInformationJobObject', 'WSAConnect', 'CryptDecrypt', 'GetTimeZoneInformation', 'InternetOpenW', 'CoInitializeEx', 'CryptGenKey', 'GetAsyncKeyState', 'NtQueryInformationFile', 'GetSystemMetrics', 'NtDeleteValueKey', 'NtOpenKeyEx', 'sendto', 'IsDebuggerPresent', 'RegQueryInfoKeyW', 'NetShareEnum', 'InternetOpenUrlW', 'WSASocketA', 'CopyFileExW', 'connect', 'ShellExecuteExW', 'SearchPathW', 'GetUserNameA', 'InternetOpenUrlA', 'LdrUnloadDll', 'EnumServicesStatusW', 'EnumServicesStatusA', 'WSASend', 'CopyFileW', 'NtDeleteFile', 'CreateActCtxW', 'timeGetTime', 'MessageBoxTimeoutA', 'CreateServiceA', 'FindResourceExW', 'WSAAccept', 'InternetConnectW', 'HttpSendRequestA', 'GetVolumePathNameW', 'RegCloseKey', 'InternetGetConnectedStateExW', 'GetAdaptersInfo', 'shutdown', 'NtQueryMultipleValueKey', 'NtQueryKey', 'GetSystemWindowsDirectoryW', 'GlobalMemoryStatusEx', 'GetFileAttributesExW', 'OpenServiceW', 'getsockname', 'LoadStringA', 'UnhookWindowsHookEx', 'NtCreateUserProcess', 'Process32NextW', 'CreateThread', 'LoadResource', 'GetSystemTimeAsFileTime', 'SetStdHandle', 'CoCreateInstanceEx', 'GetSystemDirectoryA', 'NtCreateMutant', 'RegCreateKeyExW', 'IWbemServices_ExecQuery', 'NtDuplicateObject', 'Thread32First', 'OpenSCManagerW', 'CreateServiceW', 'GetFileType', 'MoveFileWithProgressW', 'NtDeviceIoControlFile', 'GetFileInformationByHandleEx', 'CopyFileA', 'NtLoadKey', 'GetNativeSystemInfo', 'NtOpenProcess', 'CryptUnprotectMemory', 'InternetWriteFile', 'ReadProcessMemory', 'gethostbyname', 'WSASendTo', 'NtOpenSection', 'listen', 'WSAStartup', 'socket', 'OleInitialize', 'FindResourceA', 'RegOpenKeyExA', 'RegEnumKeyExA', 'NtQueryDirectoryFile', 'CertOpenSystemStoreW', 'ControlService', 'LdrGetProcedureAddress', 'GlobalMemoryStatus', 'NtSetInformationFile', 'OutputDebugStringA', 'GetAdaptersAddresses', 'CoInitializeSecurity', 'RegQueryValueExA', 'NtQueryFullAttributesFile', 'DeviceIoControl', '__anomaly__', 'DeleteFileW', 'GetShortPathNameW', 'NtGetContextThread', 'GetKeyboardState', 'RemoveDirectoryA', 'InternetSetStatusCallback', 'NtResumeThread', 'SetFileInformationByHandle', 'NtCreateSection', 'NtQueueApcThread', 'accept', 'DecryptMessage', 'GetUserNameExW', 'SizeofResource', 'RegQueryValueExW', 'SetWindowsHookExW', 'HttpOpenRequestW', 'CreateDirectoryW', 'InternetOpenA', 'GetFileVersionInfoExW', 'FindWindowA', 'closesocket', 'RtlAddVectoredExceptionHandler', 'IWbemServices_ExecMethod', 'GetDiskFreeSpaceExW', 'TaskDialog', 'WriteConsoleW', 'CryptEncrypt', 'WSARecvFrom', 'NtOpenMutant', 'CoGetClassObject', 'NtQueryValueKey', 'NtDelayExecution', 'select', 'HttpQueryInfoA', 'GetVolumePathNamesForVolumeNameW', 'RegDeleteValueW', 'InternetCrackUrlA', 'OpenServiceA', 'InternetSetOptionA', 'CreateDirectoryExW', 'bind', 'NtShutdownSystem', 'DeleteUrlCacheEntryA', 'NtMapViewOfSection', 'LdrGetDllHandle', 'NtCreateKey', 'GetKeyState', 'CreateRemoteThread', 'NtEnumerateValueKey', 'SetFileAttributesW', 'NtUnmapViewOfSection', 'RegDeleteValueA', 'CreateJobObjectW', 'send', 'NtDeleteKey', 'SetEndOfFile', 'GetUserNameExA', 'GetComputerNameA', 'URLDownloadToFileW', 'NtFreeVirtualMemory', 'recvfrom', 'NtUnloadDriver', 'NtTerminateThread', 'CryptUnprotectData', 'NtCreateThreadEx', 'DeleteService', 'GetFileAttributesW', 'GetFileVersionInfoSizeExW', 'OpenSCManagerA', 'WriteProcessMemory', 'GetSystemInfo', 'SetFilePointer', 'Module32FirstW', 'ioctlsocket', 'RegEnumKeyW', 'RtlCompressBuffer', 'SendNotifyMessageW', 'GetAddrInfoW', 'CryptProtectData', 'Thread32Next', 'NtAllocateVirtualMemory', 'RegEnumKeyExW', 'RegSetValueExA', 'DrawTextExA', 'CreateToolhelp32Snapshot', 'FindWindowW', 'CoUninitialize', 'NtClose', 'WSARecv', 'CertOpenStore', 'InternetGetConnectedState', 'RtlAddVectoredContinueHandler', 'RegDeleteKeyW', 'SHGetSpecialFolderLocation', 'CreateProcessInternalW', 'NtCreateDirectoryObject', 'EnumWindows', 'DrawTextExW', 'RegEnumValueW', 'SendNotifyMessageA', 'NtProtectVirtualMemory', 'NetUserGetLocalGroups', 'GetUserNameW', 'WSASocketW', 'getaddrinfo', 'AssignProcessToJobObject', 'SetFileTime', 'WriteConsoleA', 'CryptDecodeObjectEx', 'EncryptMessage', 'system', 'NtSetContextThread', 'LdrLoadDll', 'InternetGetConnectedStateExA', 'RtlCreateUserThread', 'GetCursorPos', 'Module32NextW', 'RegCreateKeyExA', 'NtLoadDriver', 'NetUserGetInfo', 'SHGetFolderPathW', 'GetBestInterfaceEx', 'CertControlStore', 'StartServiceA', 'NtWriteFile', 'Process32FirstW', 'NtReadVirtualMemory', 'GetDiskFreeSpaceW', 'GetFileVersionInfoW', 'FindFirstFileExW', 'FindWindowExW', 'GetSystemWindowsDirectoryA', 'RegOpenKeyExW', 'CoCreateInstance', 'NtQuerySystemInformation', 'LookupPrivilegeValueW', 'NtReadFile', 'ReadCabinetState', 'GetForegroundWindow', 'InternetCloseHandle', 'FindWindowExA', 'ObtainUserAgentString', 'CryptCreateHash', 'GetTempPathW', 'CryptProtectMemory', 'NetGetJoinInformation', 'NtOpenKey', 'GetSystemDirectoryW', 'DnsQuery_A', 'RegQueryInfoKeyA', 'NtEnumerateKey', 'RegisterHotKey', 'RemoveDirectoryW', 'FindFirstFileExA', 'CertOpenSystemStoreA', 'NtTerminateProcess', 'NtSetValueKey', 'CryptAcquireContextA', 'SetErrorMode', 'UuidCreate', 'RtlRemoveVectoredExceptionHandler', 'RegDeleteKeyA', 'setsockopt', 'FindResourceExA', 'NtSuspendThread', 'GetFileVersionInfoSizeW', 'NtOpenDirectoryObject', 'InternetQueryOptionA', 'InternetReadFile', 'NtCreateFile', 'NtQueryAttributesFile', 'HttpSendRequestW', 'CryptHashMessage', 'CryptHashData', 'NtWriteVirtualMemory', 'SetFilePointerEx', 'CertCreateCertificateContext', 'DeleteUrlCacheEntryW', '__exception__']

* ACKNOWLEDGMENTS *

We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.

* CITATIONS *

"Oliveira, Angelo; Sassi, Renato José (2019): Behavioral Malware Detection Using Deep Graph Convolutional Neural Networks. TechRxiv. Preprint." at https://doi.org/10.36227/techrxiv.10043099.v1 Please feel free to contact me for any further information.

Categories:
4036 Views

Collecting and analysing heterogeneous data sources from the Internet of Things (IoT) and Industrial IoT (IIoT) are essential for training and validating the fidelity of cybersecurity applications-based machine learning.  However, the analysis of those data sources is still a big challenge for reducing high dimensional space and selecting important features and observations from different data sources.

Categories:
5562 Views

Boğaziçi University DDoS dataset (BOUN DDoS) is generated in Boğaziçi University via Hping3 traffic generator software by flooding TCP SYN, and UDP packets. This dataset includes attack-free user traffic as well as attack traffic and suitable for evaluating network-based DDoS detection methods. Attacks are towards one victim server connected to the backbone router of the campus.  Attack packets have randomly generated spoofed source  IP addresses.  The data-trace was recorded on the backbone and included over 4000 active hosts.

Instructions: 

Bo ğaziçi University DDoS dataset (BOUN DDoS) is generated in Bo ğaziçi University via Hping3 traffic generator software

by flooding TCP SYN, and UDP packets. This dataset includes attack-free user traffic as well as attack traffic and suitable for

evaluating network-based DDoS detection methods. Attacks are towards one victim server connected to the backbone router of

the campus. Attack packets have randomly generated spoofed source IP addresses. The data-trace was recorded on the backbone

and included over 4000 active hosts.

I. INTRODUCTION

The dataset includes two different attack scenarios. In both scenarios, randomly generated spoofed IP addresses are used in

a flooding manner. For TCP flood attacks, TCP port 80 is used as the destination port. All of the datasets lasted 8 minutes.

In each of them, 80 seconds waiting period, then 20 seconds attack period is practiced. Different packet rates are used to let

researchers evaluate their detection methods concerning different packets rates.

The TCP SYN Flood and UDP flood datasets include attack rates of 1000, 1500, 2000 and 2500 packets/second. The

topology of the attack is given in Figure 1.

Fig. 1. BOUN DDoS attack topology.

Attack packets can be distinguished from attack-free packets using the destination IP address of packets. The victim IP

address is 10.50.199.86.

II. DATASET STRUCTURE

Datasets are in comma-separated value file format, and have the following columns:

    Time: Time values start from zero and have a resolution of 0.000001 seconds. Time values are expressed in seconds.

    Frame Number: Frame number is simply the incremental count of packets in the dataset.

    Frame length: Frame length is the length of that packet in bytes.

    Source ip: Source IP address of the packet.

    Destination IP: Destination Ip address of the packet.

    Source Port: Source TCP port of the packet. If it is not a TCP packet, this field is empty.

    Destination Port: Destination TCP port of the packet. If it is not a TCP packet, this field is empty

    SYN: This value is “Set” if the packet is a TCP packet and its SYN flag is equal to one, it is equal to “Not Set” if the

packet is a TCP packet and its SYN flag is equal to zero. If the packet is not a TCP packet, this field is empty.

1

    ACK: This value is “Set” if the packet is a TCP packet and its ACK flag is equal to one, it is equal to “Not Set” if the

packet is a TCP packet and its ACK flag is equal to zero. If the packet is not a TCP packet, this field is empty.

    RST: This value is “Set” if the packet is a TCP packet and its RST flag is equal to one, it is equal to “Not Set” if the

packet is a TCP packet and its RST flag is equal to zero. If the packet is not a TCP packet, this field is empty.

    TTL: Time to live value of the packets.

    TCP Protocol: This value can be TCP or UDP if the packet belongs to a transport layer IP protocol. Else this value can

have different values.

Categories:
1807 Views

We created various types of network attacks in Internet of Things (IoT) environment for academic purpose. Two typical smart home devices -- SKT NUGU (NU 100) and EZVIZ Wi-Fi Camera (C2C Mini O Plus 1080P) -- were used. All devices, including some laptops or smart phones, were connected to the same wireless network. The dataset consists of 42 raw network packet files (pcap) at different time points.

* The packet files are captured by using monitor mode of wireless network adapter. The wireless headers are removed by Aircrack-ng.

Instructions: 

The dataset consists of 42 raw network packet files (pcap) at different time points.

* The packet files are captured by using monitor mode of wireless network adapter. The wireless headers are removed by Aircrack-ng.

* All attacks except Mirai Botnet category are the packets captured while simulating attacks using tools such as Nmap. The case of the Mirai Botnet category, the attack packets were generated on a laptop and then manipulated to make it appear as if it originated from the IoT device.

 

<packet file description>

benign-dec.pcap: benign-only traffic

mitm-arpspoofing-n(1~6)-dec.pcap: traffic containing benign and MITM(arp spoofing)

dos-synflooding-n(1~6)-dec.pcap: traffic containing benign and DoS(SYN flooding) attack

scan-hostport-n(1~6)-dec.pcap: traffic containing benign and Scan(host & port scan) attack

scan-portos-n(1~6)-dec.pcap: traffic containing benign and Scan(port & os scan) attack

mirai-udpflooding-n(1~4)-dec.pcap: traffic containing benign and 3 most typical attacks(UDP/ACK/HTTP Flooding) of zombie pc compromised by mirai malware

mirai-ackflooding-n(1~4)-dec.pcap

mirai-httpflooding-n(1~4)-dec.pcap

mirai-hostbruteforce-n(1~5)-dec.pcap: traffic containing benign and initial phase of Mirai malware including host discovery and Telnet brute-force attack

Categories:
22708 Views

This dataset contains Cyber Threat Intelligence (CTI) data generated from public security reports and malware repositories.

The dataset is stored in a structured format (XML) and includes approximately 640,000 records from 612 security reports published from January 2008 to June 2019.

Several data types are contained in this dataset such as URL, host, IP address, e-mail account, hashes (MD5, SHA1, and SHA256), common vulnerabilities and exposures (CVE), registry, file names ending with specific extensions, and the program database (PDB) path.

Instructions: 

For more instruction about the dataset as well as the system generating the dataset, please see following paper:

Daegeon Kim and Huy Kang Kim, “Automated Dataset Generation System for Collaborative Research of Cyber Threat Analysis,” Security and Communication Networks, vol. 2019, Article ID 6268476, 10 pages, 2019. https://doi.org/10.1155/2019/6268476.

Categories:
3506 Views

Pages