Machine Learning

This dataset comprises three benchmarks: Digits-5, PACS, anf office_caltech_10. Digits-5 is a set of handwritten digit images sampled from five domains: MNIST, MNIST-M, USPS, SynthDigits, and SVHN.  All sample are images of numbers ranging from 0 to 9.  PACS is composed of four different datasets, each representing a different visual domain: Photo, Art Painting, Cartoon, and Sketch. It contains 9,944 images, including 1,792 real photos, 2,048 art paintings, 2,344 cartoon images, and 2,760 sketches.

Categories:
109 Views

Bengaluru has been ranked the most congested city in India in terms of traffic for several years now. This hackathon is aimed at creating innovative solutions to the traffic management problem in Bengaluru, and is being co-organised by the Bengaluru Traffic Police, the Centre for Data for Public Good, and the Indian Institute of Science (IISc). The prizes are being sponsored by the IEEE Foundation.

Last Updated On: 
Thu, 10/17/2024 - 05:18
Citation Author(s): 
Raghu Krishnapuram, Rakshit Ramesh, and Arun Josephraj

Numerous studies have focused on exploring Android malware in recent years, covering areas such as malware detection and application analysis. As a result, there is a pressing need for a reliable and scalable malware dataset to support the development and evaluation of effective malware studies. Although several benchmarks for Android malware datasets are widely used in research, they have significant limitations. Firstly, many of these datasets are outdated and do not capture current malware trends. Additionally, some have become obsolete or inaccessible, limiting their usefulness.

Categories:
289 Views

Health degradation issues in automotive power electronics converter systems (PECs) arise due to repetitive thermomechanical stress experienced during real-world vehicle operation. This stress, caused by heat generated during semiconductor operation within PECs, leads to the degradation of semiconductor's operating life. Estimating the power semiconductor junction temperature (Tj) is crucial for assessing semiconductor degradation in operation. Although physics-of-failure-based models can estimate Tj, they require substantial computational power.

Categories:
273 Views

Wild-SHARD presents a novel Human Activity Recognition (HAR) dataset collected in an uncontrolled, real-world (wild) environment to address the limitations of existing datasets, which often need more non-simulated data. Our dataset comprises a time series of Activities of Daily Living (ADLs) captured using multiple smartphone models such as Samsung Galaxy F62, Samsung Galaxy A30s, Poco X2, One Plus 9 Pro and many more. These devices enhance data variability and robustness with their varied sensor manufacturers.

Categories:
468 Views

This dataset consists of near-infrared spectral images of eight different varieties of corn seeds, classified as FH759, JL59,JY54,JY205, LH205,XX5, ZY2207, SY81. Each variety contains images of embryonic and endosperm surfaces, with 50 samples per image. The wavelength range is 881-1715 nm.

Categories:
160 Views

Hand contact data, reflecting the intricate behaviours of human hands during object operation, exhibits significant potential for analysing hand operation patterns to guide the design of hand-related sensors and robots, and predicting object properties. However, these potential applications are hindered by the constraints of low resolution and incomplete capture of the hand contact data.

Categories:
200 Views

Despite the existence of road image datasets, these datasets predominantly focus on European roads with less variability in traffic and road conditions. To address this limitation, we have developed an image dataset tailored to Indian road conditions, capturing the extensive variations in traffic and environment.

Categories:
268 Views

We present the SinOCR and SinFUND datasets, two comprehensive resources designed to advance Optical Character Recognition (OCR) and form understanding for the Sinhala language. SinOCR, the first publicly available and the most extensive dataset for Sinhala OCR to date, includes 100,000 images featuring printed text in 200 different Sinhala fonts and 1,135 images of handwritten text, capturing a wide spectrum of writing styles.

Categories:
414 Views

The dataset is compiled from different versions of multiple projects across six architectures (ARM-32, ARM-64, MIPS-32, MIPS-64, X86-32, X86-64) and four compilation optimization levels (O0, O1, O2, O3), totaling 36,864 binary files. Each file corresponds to a specific combination of architecture and optimization level, providing a wide range of samples for analyzing and researching the properties and characteristics of binary files.

Categories:
225 Views

Pages