Information:
This dataset was created for research on blockchain anomaly and fraud detection. And donated to IEEE data port online community.
https://github.com/epicprojects/blockchain-anomaly-detection
A directed-acyclic graph is created from the bitcoin transaction data and metadata is extracted to create this dataset.
DIMENSIONS:
- tx_hash: Hash of the bitcoin transaction.
- indegree: Number of transactions that are inputs of tx_hash
- outdegree: Number of transactions that are outputs of tx_hash.
- in_btc: Number of bitcoins on each incoming edge to tx_hash.
- out_btc: Number of bitcoins on each outgoing edge from tx_hash.
- total_btc: Net number of bitcoins flowing in and out from tx_hash.
- mean_in_btc: Average number of bitcoins flowing in for tx_hash.
- mean_out_btc: Average number of bitcoins flowing out for tx_hash.
- in-malicious: Will be 1 if the tx_hash is an input of a malicious transaction.
- out-malicious: Will be 1 if the tx_hash is an output of a malicious transaction.
- is-malicious: Will be 1 if the tx_hash is a malicious transaction.
- out_and_tx_malicious: Will be 1 if the tx_hash is a malicious transaction or an output of a malicious transaction.
- all_malicious: Will be 1 if the tx_hash is a malicious transaction or an output of a malicious transaction or input of a malicious transaction.
REFERENCES:
- https://arxiv.org/abs/1611.03942
- https://arxiv.org/abs/1611.03941
- https://arxiv.org/abs/1107.4524
- http://anonymity-in-bitcoin.blogspot.com/2011/09/code-datasets-and-spsn1...
- http://snap.stanford.edu/class/cs224w-2013/projects2013/cs224w-030-final...
- Categories:

This is a dataset consisting of 8 features extracted from 70,000 monochromatic still images adapted from the Genome Project Standford's database, that are labeled in two classes: LSB steganography (1) and without LSB Steganography (0). These features are Kurtosis, Skewness, Standard Deviation, Range, Median, Geometric Mean, Hjorth Mobility, and Hjorth Complexity, all extracted from the histograms of the still images, including random spatial transformations. The steganographic function embeds five types of payloads, from 0.1 to 0.5.
This is a dataset consisting of 8 features extracted from 70,000 monochromatic still images adapted from the Genome Project Standford's database, that are labeled in two classes: with (1) and without (0) LSB Steganography. In the training and testing dataset, it will be found 8 columns with the following features represented as numeric quantities: Kurtosis, Skewness, Standard Deviation, Range, Median, Geometric Mean, Hjorth Mobility, and Hjorth Complexity. There is a ninth column that expresses the class of the observation, being 0 as non-steganogram and 1 as steganogram. All the features were extracted from the histograms of the still images. Reading and processing of the dataset can be done using Pandas in Python, R or Matlab.
The steganographic function embeds five types of payloads, from 0.1 to 0.5. The training dataset includes 56,000 of these pairs of labeled images (with and without LSB Steganography), with which 5,600 images conform the dataset for each payload. The testing dataset has 14,000 observations and is equally divided as the training dataset.
- Categories:
This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Top-1000 imported functions extracted from the 'pe_imports' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.
* FEATURES *
Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string
Column name: GetProcAddress
Description: Most imported function (1st)
Type: 0 (Not imported) or 1 (Imported)
...
Column name: LookupAccountSidW
Description: Least imported function (1000th)
Type: 0 (Not imported) or 1 (Imported)
Column name: malware
Description: Class
Type: 0 (Goodware) or 1 (Malware)
* ACKNOWLEDGMENTS *
We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.
* CITATIONS *
Please refer to the dataset DOI.
Please feel free to contact me for any further information.
- Categories:
This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Raw PE byte stream rescaled to a 32 x 32 greyscale image using the Nearest Neighbor Interpolation algorithm and then flattened to a 1024 bytes vector. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.
* FEATURES *
Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string
Column name: pix_0
Description: The first greyscale pixel value
Type: Integer (0-255)
Column name: pix_1023
Description: The last greyscale pixel value
Type: Integer (0-255)
Column name: malware
Description: Class
Type: 0 (Goodware) or 1 (Malware)
* ACKNOWLEDGMENTS *
We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.
* CITATIONS *
Please refer to the dataset DOI.
Please feel free to contact me for any further information.
- Categories:
This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data (PE Section Headers of the .text, .code and CODE sections) extracted from the 'pe_sections' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.
* FEATURES *
Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string
Column name: size_of_data
Description: The size of the section on disk
Type: Integer
Column name: virtual_address
Description: Memory address of the first byte of the section relative to the image base
Type: Integer
Column name: entropy
Description: Calculated entropy of the section
Type: Float
Column name: virtual_size
Description: The size of the section when loaded into memory
Type: Integer
Column name: malware
Description: Class
Type: 0 (Goodware) or 1 (Malware)
* ACKNOWLEDGMENTS *
We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.
* CITATIONS *
Please refer to the dataset DOI.
Please feel free to contact me for any further information.
- Categories:

ASNM datasets include records consisting of many features, that express various properties and characteristics of TCP communications. These features are called Advanced Security Network Metrics (ASNM) and were designed with the intention to discern legitimate and malicious connections (especially intrusions).
ASNM datasets were created one by one during our long-term research. The following listing contains references to descriptions of particular datasets with their download locations:
- ASNM-NPBO Dataset - contains non-payload-based obfuscation techniques applied onto malicious and some of legitimate traffic. It was created in 2015.
- ASNM-TUN Dataset - contains tunneling obfuscation techniques applied to malicious traffic. It was created in 2014.
- ASNM-CDX-2009 Dataset - contains ASNM features extracted from tcpdumps of CDX 2009 dataset. It misses few newer ASNM features. It was created in 2013.
- Categories:
This dataset is part of our research on malware detection and classification using Deep Learning. It contains 42,797 malware API call sequences and 1,079 goodware API call sequences. Each API call sequence is composed of the first 100 non-repeated consecutive API calls associated with the parent process, extracted from the 'calls' elements of Cuckoo Sandbox reports.
* FEATURES *
Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string
Column name: t_0 ... t_99
Description: API call
Type: Integer (0-306)
Column name: malware
Description: Class
Type: Integer: 0 (Goodware) or 1 (Malware)
API Calls: ['NtOpenThread', 'ExitWindowsEx', 'FindResourceW', 'CryptExportKey', 'CreateRemoteThreadEx', 'MessageBoxTimeoutW', 'InternetCrackUrlW', 'StartServiceW', 'GetFileSize', 'GetVolumeNameForVolumeMountPointW', 'GetFileInformationByHandle', 'CryptAcquireContextW', 'RtlDecompressBuffer', 'SetWindowsHookExA', 'RegSetValueExW', 'LookupAccountSidW', 'SetUnhandledExceptionFilter', 'InternetConnectA', 'GetComputerNameW', 'RegEnumValueA', 'NtOpenFile', 'NtSaveKeyEx', 'HttpOpenRequestA', 'recv', 'GetFileSizeEx', 'LoadStringW', 'SetInformationJobObject', 'WSAConnect', 'CryptDecrypt', 'GetTimeZoneInformation', 'InternetOpenW', 'CoInitializeEx', 'CryptGenKey', 'GetAsyncKeyState', 'NtQueryInformationFile', 'GetSystemMetrics', 'NtDeleteValueKey', 'NtOpenKeyEx', 'sendto', 'IsDebuggerPresent', 'RegQueryInfoKeyW', 'NetShareEnum', 'InternetOpenUrlW', 'WSASocketA', 'CopyFileExW', 'connect', 'ShellExecuteExW', 'SearchPathW', 'GetUserNameA', 'InternetOpenUrlA', 'LdrUnloadDll', 'EnumServicesStatusW', 'EnumServicesStatusA', 'WSASend', 'CopyFileW', 'NtDeleteFile', 'CreateActCtxW', 'timeGetTime', 'MessageBoxTimeoutA', 'CreateServiceA', 'FindResourceExW', 'WSAAccept', 'InternetConnectW', 'HttpSendRequestA', 'GetVolumePathNameW', 'RegCloseKey', 'InternetGetConnectedStateExW', 'GetAdaptersInfo', 'shutdown', 'NtQueryMultipleValueKey', 'NtQueryKey', 'GetSystemWindowsDirectoryW', 'GlobalMemoryStatusEx', 'GetFileAttributesExW', 'OpenServiceW', 'getsockname', 'LoadStringA', 'UnhookWindowsHookEx', 'NtCreateUserProcess', 'Process32NextW', 'CreateThread', 'LoadResource', 'GetSystemTimeAsFileTime', 'SetStdHandle', 'CoCreateInstanceEx', 'GetSystemDirectoryA', 'NtCreateMutant', 'RegCreateKeyExW', 'IWbemServices_ExecQuery', 'NtDuplicateObject', 'Thread32First', 'OpenSCManagerW', 'CreateServiceW', 'GetFileType', 'MoveFileWithProgressW', 'NtDeviceIoControlFile', 'GetFileInformationByHandleEx', 'CopyFileA', 'NtLoadKey', 'GetNativeSystemInfo', 'NtOpenProcess', 'CryptUnprotectMemory', 'InternetWriteFile', 'ReadProcessMemory', 'gethostbyname', 'WSASendTo', 'NtOpenSection', 'listen', 'WSAStartup', 'socket', 'OleInitialize', 'FindResourceA', 'RegOpenKeyExA', 'RegEnumKeyExA', 'NtQueryDirectoryFile', 'CertOpenSystemStoreW', 'ControlService', 'LdrGetProcedureAddress', 'GlobalMemoryStatus', 'NtSetInformationFile', 'OutputDebugStringA', 'GetAdaptersAddresses', 'CoInitializeSecurity', 'RegQueryValueExA', 'NtQueryFullAttributesFile', 'DeviceIoControl', '__anomaly__', 'DeleteFileW', 'GetShortPathNameW', 'NtGetContextThread', 'GetKeyboardState', 'RemoveDirectoryA', 'InternetSetStatusCallback', 'NtResumeThread', 'SetFileInformationByHandle', 'NtCreateSection', 'NtQueueApcThread', 'accept', 'DecryptMessage', 'GetUserNameExW', 'SizeofResource', 'RegQueryValueExW', 'SetWindowsHookExW', 'HttpOpenRequestW', 'CreateDirectoryW', 'InternetOpenA', 'GetFileVersionInfoExW', 'FindWindowA', 'closesocket', 'RtlAddVectoredExceptionHandler', 'IWbemServices_ExecMethod', 'GetDiskFreeSpaceExW', 'TaskDialog', 'WriteConsoleW', 'CryptEncrypt', 'WSARecvFrom', 'NtOpenMutant', 'CoGetClassObject', 'NtQueryValueKey', 'NtDelayExecution', 'select', 'HttpQueryInfoA', 'GetVolumePathNamesForVolumeNameW', 'RegDeleteValueW', 'InternetCrackUrlA', 'OpenServiceA', 'InternetSetOptionA', 'CreateDirectoryExW', 'bind', 'NtShutdownSystem', 'DeleteUrlCacheEntryA', 'NtMapViewOfSection', 'LdrGetDllHandle', 'NtCreateKey', 'GetKeyState', 'CreateRemoteThread', 'NtEnumerateValueKey', 'SetFileAttributesW', 'NtUnmapViewOfSection', 'RegDeleteValueA', 'CreateJobObjectW', 'send', 'NtDeleteKey', 'SetEndOfFile', 'GetUserNameExA', 'GetComputerNameA', 'URLDownloadToFileW', 'NtFreeVirtualMemory', 'recvfrom', 'NtUnloadDriver', 'NtTerminateThread', 'CryptUnprotectData', 'NtCreateThreadEx', 'DeleteService', 'GetFileAttributesW', 'GetFileVersionInfoSizeExW', 'OpenSCManagerA', 'WriteProcessMemory', 'GetSystemInfo', 'SetFilePointer', 'Module32FirstW', 'ioctlsocket', 'RegEnumKeyW', 'RtlCompressBuffer', 'SendNotifyMessageW', 'GetAddrInfoW', 'CryptProtectData', 'Thread32Next', 'NtAllocateVirtualMemory', 'RegEnumKeyExW', 'RegSetValueExA', 'DrawTextExA', 'CreateToolhelp32Snapshot', 'FindWindowW', 'CoUninitialize', 'NtClose', 'WSARecv', 'CertOpenStore', 'InternetGetConnectedState', 'RtlAddVectoredContinueHandler', 'RegDeleteKeyW', 'SHGetSpecialFolderLocation', 'CreateProcessInternalW', 'NtCreateDirectoryObject', 'EnumWindows', 'DrawTextExW', 'RegEnumValueW', 'SendNotifyMessageA', 'NtProtectVirtualMemory', 'NetUserGetLocalGroups', 'GetUserNameW', 'WSASocketW', 'getaddrinfo', 'AssignProcessToJobObject', 'SetFileTime', 'WriteConsoleA', 'CryptDecodeObjectEx', 'EncryptMessage', 'system', 'NtSetContextThread', 'LdrLoadDll', 'InternetGetConnectedStateExA', 'RtlCreateUserThread', 'GetCursorPos', 'Module32NextW', 'RegCreateKeyExA', 'NtLoadDriver', 'NetUserGetInfo', 'SHGetFolderPathW', 'GetBestInterfaceEx', 'CertControlStore', 'StartServiceA', 'NtWriteFile', 'Process32FirstW', 'NtReadVirtualMemory', 'GetDiskFreeSpaceW', 'GetFileVersionInfoW', 'FindFirstFileExW', 'FindWindowExW', 'GetSystemWindowsDirectoryA', 'RegOpenKeyExW', 'CoCreateInstance', 'NtQuerySystemInformation', 'LookupPrivilegeValueW', 'NtReadFile', 'ReadCabinetState', 'GetForegroundWindow', 'InternetCloseHandle', 'FindWindowExA', 'ObtainUserAgentString', 'CryptCreateHash', 'GetTempPathW', 'CryptProtectMemory', 'NetGetJoinInformation', 'NtOpenKey', 'GetSystemDirectoryW', 'DnsQuery_A', 'RegQueryInfoKeyA', 'NtEnumerateKey', 'RegisterHotKey', 'RemoveDirectoryW', 'FindFirstFileExA', 'CertOpenSystemStoreA', 'NtTerminateProcess', 'NtSetValueKey', 'CryptAcquireContextA', 'SetErrorMode', 'UuidCreate', 'RtlRemoveVectoredExceptionHandler', 'RegDeleteKeyA', 'setsockopt', 'FindResourceExA', 'NtSuspendThread', 'GetFileVersionInfoSizeW', 'NtOpenDirectoryObject', 'InternetQueryOptionA', 'InternetReadFile', 'NtCreateFile', 'NtQueryAttributesFile', 'HttpSendRequestW', 'CryptHashMessage', 'CryptHashData', 'NtWriteVirtualMemory', 'SetFilePointerEx', 'CertCreateCertificateContext', 'DeleteUrlCacheEntryW', '__exception__']
* ACKNOWLEDGMENTS *
We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.
* CITATIONS *
"Oliveira, Angelo; Sassi, Renato José (2019): Behavioral Malware Detection Using Deep Graph Convolutional Neural Networks. TechRxiv. Preprint." at https://doi.org/10.36227/techrxiv.10043099.v1 Please feel free to contact me for any further information.
- Categories:

Collecting and analysing heterogeneous data sources from the Internet of Things (IoT) and Industrial IoT (IIoT) are essential for training and validating the fidelity of cybersecurity applications-based machine learning. However, the analysis of those data sources is still a big challenge for reducing high dimensional space and selecting important features and observations from different data sources.
- Categories:

Boğaziçi University DDoS dataset (BOUN DDoS) is generated in Boğaziçi University via Hping3 traffic generator software by flooding TCP SYN, and UDP packets. This dataset includes attack-free user traffic as well as attack traffic and suitable for evaluating network-based DDoS detection methods. Attacks are towards one victim server connected to the backbone router of the campus. Attack packets have randomly generated spoofed source IP addresses. The data-trace was recorded on the backbone and included over 4000 active hosts.
Bo ğaziçi University DDoS dataset (BOUN DDoS) is generated in Bo ğaziçi University via Hping3 traffic generator software
by flooding TCP SYN, and UDP packets. This dataset includes attack-free user traffic as well as attack traffic and suitable for
evaluating network-based DDoS detection methods. Attacks are towards one victim server connected to the backbone router of
the campus. Attack packets have randomly generated spoofed source IP addresses. The data-trace was recorded on the backbone
and included over 4000 active hosts.
I. INTRODUCTION
The dataset includes two different attack scenarios. In both scenarios, randomly generated spoofed IP addresses are used in
a flooding manner. For TCP flood attacks, TCP port 80 is used as the destination port. All of the datasets lasted 8 minutes.
In each of them, 80 seconds waiting period, then 20 seconds attack period is practiced. Different packet rates are used to let
researchers evaluate their detection methods concerning different packets rates.
The TCP SYN Flood and UDP flood datasets include attack rates of 1000, 1500, 2000 and 2500 packets/second. The
topology of the attack is given in Figure 1.
Fig. 1. BOUN DDoS attack topology.
Attack packets can be distinguished from attack-free packets using the destination IP address of packets. The victim IP
address is 10.50.199.86.
II. DATASET STRUCTURE
Datasets are in comma-separated value file format, and have the following columns:
Time: Time values start from zero and have a resolution of 0.000001 seconds. Time values are expressed in seconds.
Frame Number: Frame number is simply the incremental count of packets in the dataset.
Frame length: Frame length is the length of that packet in bytes.
Source ip: Source IP address of the packet.
Destination IP: Destination Ip address of the packet.
Source Port: Source TCP port of the packet. If it is not a TCP packet, this field is empty.
Destination Port: Destination TCP port of the packet. If it is not a TCP packet, this field is empty
SYN: This value is “Set” if the packet is a TCP packet and its SYN flag is equal to one, it is equal to “Not Set” if the
packet is a TCP packet and its SYN flag is equal to zero. If the packet is not a TCP packet, this field is empty.
1
ACK: This value is “Set” if the packet is a TCP packet and its ACK flag is equal to one, it is equal to “Not Set” if the
packet is a TCP packet and its ACK flag is equal to zero. If the packet is not a TCP packet, this field is empty.
RST: This value is “Set” if the packet is a TCP packet and its RST flag is equal to one, it is equal to “Not Set” if the
packet is a TCP packet and its RST flag is equal to zero. If the packet is not a TCP packet, this field is empty.
TTL: Time to live value of the packets.
TCP Protocol: This value can be TCP or UDP if the packet belongs to a transport layer IP protocol. Else this value can
have different values.
- Categories:
We created various types of network attacks in Internet of Things (IoT) environment for academic purpose. Two typical smart home devices -- SKT NUGU (NU 100) and EZVIZ Wi-Fi Camera (C2C Mini O Plus 1080P) -- were used. All devices, including some laptops or smart phones, were connected to the same wireless network. The dataset consists of 42 raw network packet files (pcap) at different time points.
* The packet files are captured by using monitor mode of wireless network adapter. The wireless headers are removed by Aircrack-ng.
The dataset consists of 42 raw network packet files (pcap) at different time points.
* The packet files are captured by using monitor mode of wireless network adapter. The wireless headers are removed by Aircrack-ng.
* All attacks except Mirai Botnet category are the packets captured while simulating attacks using tools such as Nmap. The case of the Mirai Botnet category, the attack packets were generated on a laptop and then manipulated to make it appear as if it originated from the IoT device.
<packet file description>
benign-dec.pcap: benign-only traffic
mitm-arpspoofing-n(1~6)-dec.pcap: traffic containing benign and MITM(arp spoofing)
dos-synflooding-n(1~6)-dec.pcap: traffic containing benign and DoS(SYN flooding) attack
scan-hostport-n(1~6)-dec.pcap: traffic containing benign and Scan(host & port scan) attack
scan-portos-n(1~6)-dec.pcap: traffic containing benign and Scan(port & os scan) attack
mirai-udpflooding-n(1~4)-dec.pcap: traffic containing benign and 3 most typical attacks(UDP/ACK/HTTP Flooding) of zombie pc compromised by mirai malware
mirai-ackflooding-n(1~4)-dec.pcap
mirai-httpflooding-n(1~4)-dec.pcap
mirai-hostbruteforce-n(1~5)-dec.pcap: traffic containing benign and initial phase of Mirai malware including host discovery and Telnet brute-force attack
- Categories: