Machine Learning

The BNS (Bharatiya Nyay Sanhita) dataset is a comprehensive collection of legal texts which was web-scraped.. It consists of chapters and their respective sections, capturing detailed legal content relevant to the recently introduced BNS framework in India. This dataset was gathered using a Python-based web scraping script leveraging Selenium WebDriver, ensuring accuracy and completeness. Available in CSV formats, the dataset facilitates ease of access for legal research, natural language processing (NLP) tasks, and AI-based legal assistance applications.
- Categories:
The PhishFOE Dataset is a comprehensive dataset designed for phishing URL detection using machine learning techniques. The dataset contains 101,083 URLs, with labeled features extracted from both the URL structure and HTML content of webpages. It provides insights into key characteristics that distinguish phishing websites from legitimate ones.
-
Total Samples: 101,063
-
Label:
0
for Legitimate,1
for Phishing
- Categories:
This paper presents an enhanced methodology for network anomaly detection in Industrial IoT (IIoT) systems using advanced data aggregation and Mutual Information (MI)-based feature selection. The focus is on transforming raw network traffic into meaningful, aggregated forms that capture crucial temporal and statistical patterns. A refined set of 150 features including unique IP counts, TCP acknowledgment patterns, and ICMP sequence ratios was identified using MI to enhance detection accuracy.
- Categories:

This dataset is used for machine learning. And the data set is collected in different micro-environments. In this project, ExpoM-RF 4 is used to measure the electric field strength. Four different typs of micro-environments are selected which are urban (6 high population density areas in Kuala Lumpur), suburban (7 low population density areas in Cyberjaya), park (3 park areas) and one indoor micro-environment. From the measurement campaigns, three machine learning (ML) techniques are simulated to model the Electric Field Strength in each micro-environment.
- Categories:
This dataset comprises 2 million synthetic samples generated using the Variational Autoencoder-Generative Adversarial Network (VAE-GAN) technique. The dataset is designed to facilitate cardiovascular disease prediction through various demographic, physical, and health-related attributes. It contains essential physiological and behavioral indicators that contribute to cardiovascular health.
Dataset Description The dataset consists of the following features:
- Categories:
This dataset comprises 2 million synthetic samples generated using the Variational Autoencoder-Generative Adversarial Network (VAE-GAN) technique. The dataset is designed to facilitate cardiovascular disease prediction through various demographic, physical, and health-related attributes. It contains essential physiological and behavioral indicators that contribute to cardiovascular health.
Dataset Description The dataset consists of the following features:
- Categories:

LIVE-Viasat Real-World Satellite QoE Database contains 179 videos from real-world streaming, encompassing a range of distortions. Enhanced by a study with 54 participants providing detailed QoE feedback, our work not only provides a rich analysis of the determinants of subjective QoE but also delves into how various streaming impairments influence user behavior, thereby offering a more holistic understanding of user satisfaction.
- Categories:

LIVE-Viasat Real-World Satellite QoE Database contains 179 videos from real-world streaming, encompassing a range of distortions. Enhanced by a study with 54 participants providing detailed QoE feedback, our work not only provides a rich analysis of the determinants of subjective QoE but also delves into how various streaming impairments influence user behavior, thereby offering a more holistic understanding of user satisfaction.
- Categories:

Large Vision-Language Models (LVLMs) struggle with distractions, particularly in the presence of irrelevant visual or textual inputs. This paper introduces the Irrelevance Robust Visual Question Answering (IR-VQA) benchmark to systematically evaluate and mitigate this ``multimodal distractibility". IR-VQA targets three key paradigms: irrelevant visual contexts in image-independent questions, irrelevant textual contexts in image-dependent questions, and text-only distractions.
- Categories:
Agriculture is the backbone of Mizoram’s state economy as the majority of the people use agriculture and its allied sector as their livelihood. According to the 2011 census, more than 50% of the people are still engaged in agriculture and its related activities. Jhum cultivation or shifting cultivation is the primary farming pattern in the state.
- Categories: