Machine Learning

Malware Analysis Datasets: API Call Sequences

This dataset is part of our research on malware detection and classification using Deep Learning. It contains 42,797 malware API call sequences and 1,079 goodware API call sequences. Each API call sequence is composed of the first 100 non-repeated consecutive API calls associated with the parent process, extracted from the 'calls' elements of Cuckoo Sandbox reports.

Categories:: Machine Learning
Security

8878 Views

Online Offline Learning for Sound-based Indoor Localization Using Low-cost Hardware Data

7200 .csv files, each containing a 10 kHz recording of a 1 ms lasting 100 hz sound, recorded centimeterwise in a 20 cm x 60 cm locating range on a table. 3600 files (3 at each of the 1200 different positions) are without an obstacle between the loudspeaker and the microphone, 3600 RIR recordings are affected by the changes of the object (a book). The OOLA is initially trained offline in batch mode by the first instance of the RIR recordings without the book. Then it learns online in an incremental mode how the RIR changes by the book.

Categories:: Artificial Intelligence
IoT
Sensors
Signal Processing

695 Views

CURE-TSD: Challenging Unreal and Real Environment for Traffic Sign Detection

As one of the research directions at OLIVES Lab @ Georgia Tech, we focus on the robustness of data-driven algorithms under diverse challenging conditions where trained models can possibly be depolyed. To achieve this goal, we introduced a large-sacle (~1.72M frames) traffic sign detection video dataset (CURE-TSD) which is among the most comprehensive datasets with controlled synthetic challenging conditions. The video sequences in the

Categories:: Artificial Intelligence
Signal Processing
Machine Learning
Transportation
Image Processing
Computer Vision
Climate Change/Environmental

5426 Views

CURE-TSR: Challenging Unreal and Real Environments for Traffic Sign Recognition

Categories:: Artificial Intelligence
Signal Processing
Machine Learning
Transportation
Image Processing
Computer Vision
Climate Change/Environmental

4211 Views

MIRAGE: Mobile-app Traffic Capture and Ground-truth Creation

Network traffic analysis, i.e. the umbrella of procedures for distilling information from network traffic, represents the enabler for highly-valuable profiling information, other than being the workhorse for several key network management tasks. While it is currently being revolutionized in its nature by the rising share of traffic generated by mobile and hand-held devices, existing design solutions are mainly evaluated on private traffic traces, and only a few public datasets are available, thus clearly limiting repeatability and further advances on the topic.

Categories:: Artificial Intelligence
Communications
Computational Intelligence

1946 Views

Telugu Handwritten Vowels

A paradigm dataset is constantly required for any characterization framework. As far as we could possibly know, no paradigmdataset exists for manually written characters of Telugu Aksharaalu content in open space until now. Telugu content (Telugu: తెలుగు లిపి, romanized: Telugu lipi), an abugida from the Brahmic group of contents, is utilized to compose the Telugu language, a Dravidian language spoken in the India of Andhra Pradesh and Telangana just a few other neighboring states. The Telugu content is generally utilized for composing Sanskrit writings.

Categories:: Computer Vision
Image Processing
Machine Learning

18057 Views

Energy-efficient indoor localization WiFi-fingerprint dataset

WiFi measurements dataset for WiFi fingerprint indoor localization compiled on the first and ground floors of the Escuela Técnica Superior de Ingeniería Informática, in Seville, Spain. The facility has 24.000 m² approximately, although only accessible areas were compiled.

Categories:: Communications
Sensors
Signal Processing

1713 Views

File Fragment Type (FFT) - 75 Dataset

This FFT-75 dataset contains randomly sampled, potentially overlapping file fragments from 75 popular file types (see details below). It is the most diverse and balanced dataset available to the best of our knowledge. The dataset is labeled with class IDs and is ready for training supervised machine learning models. We distinguish 6 different scenarios with different granularity and provide variants with 512 and 4096-byte blocks. In each case, we sampled a balanced dataset and split the data as follows: 80% for training, 10% for testing and 10% for validation.

Categories:: Security

3602 Views

The Good, The Bad and The Fair: KPIs from Network Elements

Measurements collected from R1 for root cause analyses of the network service states defined from quality and service design perspectives

Categories:: Communications

616 Views

Big Data Machine Learning Benchmark on Spark

We introduce a benchmark of distributed algorithms execution over big data. The datasets are composed of metrics about the computational impact (resource usage) of eleven well-known machine learning techniques on a real computational cluster regarding system resource agnostic indicators: CPU consumption, memory usage, operating system processes load, net traffic, and I/O operations. The metrics were collected every five seconds for each algorithm on five different data volume scales, totaling 275 distinct datasets.

Categories:: Standards Research Data
Computational Intelligence

1890 Views

Machine Learning

Machine Learning

Pages