Artificial Intelligence

Thyroid Nodule Ultrasound dataset

We construct the Thyroid Nodule Ultrasound (TNUS) dataset with thyroid nodule positions and puncture annotations, lacking in existing datasets. It supports future research in automating detection and diagnosis, enhancing diagnostic accuracy and clinical applications. The TNUS dataset is a curated collection of thyroid nodule ultrasound (US) images designed to support research in puncture position detection and nodule segmentation. It contains 4,376 images with puncture position annotations and 2,626 additional images with thyroid/nodule masks.

Categories:: Artificial Intelligence

217 Views

The Sociolinguistic Experience of Meta-AI’s Vernacular Modification in Eastern Indonesian Settings Reveals Lexical Obstacles and Ethical Puzzles

ÛThis article examines Meta-AI's sociolinguistic challenges on WhatsApp through research-based analysis of its limitations in adapting lexicon and precise ethical practices in intercultural communication. The study demonstrates how Meta-AI system fails to read truncated vernacular speech patterns (“kenapa” → “enapa”) while missing customized slang (“puki”) used specifically in Maluku, North Maluku and East Nusa Tenggara regions to show fundamental limitations in error recognition capabilities and contextual understanding.

Categories:: Artificial Intelligence
Education and Learning Technologies
IoT

113 Views

BNS - Indian law

The BNS (Bharatiya Nyay Sanhita) dataset is a comprehensive collection of legal texts which was web-scraped.. It consists of chapters and their respective sections, capturing detailed legal content relevant to the recently introduced BNS framework in India. This dataset was gathered using a Python-based web scraping script leveraging Selenium WebDriver, ensuring accuracy and completeness. Available in CSV formats, the dataset facilitates ease of access for legal research, natural language processing (NLP) tasks, and AI-based legal assistance applications.

Categories:: Artificial Intelligence
Machine Learning

193 Views

FT AT AI Patent Data

This is the patent data we collected from USTPO. Its part of the paper that we used for our study. It contains patent data regarding financial, assistive, and artificial intelligence technology convergence. These patents are all registered in USPTO (united states patent and trademark office) from 2001 to 2020. These data were used for network analysis. Further details will be uploaded after paper acception.

Categories:: Artificial Intelligence

10 Views

Near-Field Speaker Time Distortion - Anechoic and non-Anechoic

Dataset was created for the purposes of exploring time distortion with non-ideal near-field conditions. A 90 Hz square wave is played at 100dBA through a bookshelf speaker with the port removed. The recordings were captured at 5 separate axial distances (From 2" to 17", following inverse square law), and at three levels of resistive loading (No added resistance, 1.5 ohm, 3 ohm). The DC resistance of the speaker was measured at 6.9 ohms. To avoid overtraining, captures were recorded on a moving dynamic microphone.

Categories:: Artificial Intelligence

13 Views

Replication Data for: Retrieval-Augmented Generation for Service Discovery: Chunking Strategies and Benchmarking

Integrating multiple (sub-)systems is essential to create advanced Information Systems. Difficulties mainly arise when integrating dynamic environments, e.g., the integration at design time of not yet existing services. This has been traditionally addressed using a registry that provides the API documentation of the endpoints.

Categories:: Artificial Intelligence
Other

24 Views

Gramatika

Gramatika is a syntectic GEC dataset for Indonesian. The Gramatika dataset has a total of 1.5 million sentences with 4,666,185 errors. Of all sentences, only 30,000 (2%) are correct sentences with no mistakes. Each sentence has a maximum of 6 errors, and there can only be 2 of the same error type in each sentence.We also split the dataset into three splits: train, dev, and test splits, with the proportion of 8:1:1 (with the size of 1,199,705, 150,171, and 150,124 sentences, respectively).

Categories:: Artificial Intelligence

20 Views

Gramatika

Categories:: Artificial Intelligence

5 Views

BRURIIoT: A Dataset for Network Anomaly Detection in IIoT with an Enhanced Feature Engineering Approach

This paper presents an enhanced methodology for network anomaly detection in Industrial IoT (IIoT) systems using advanced data aggregation and Mutual Information (MI)-based feature selection. The focus is on transforming raw network traffic into meaningful, aggregated forms that capture crucial temporal and statistical patterns. A refined set of 150 features including unique IP counts, TCP acknowledgment patterns, and ICMP sequence ratios was identified using MI to enhance detection accuracy.

Categories:: Artificial Intelligence
IoT
Machine Learning
Security
Sensors

582 Views

Point cloud Dateset

This dataset is specifically designed for the recognition and localization of electric vehicle (EV) charging ports using point cloud data, rather than traditional image-based methods. It includes raw point cloud data collected from advanced sensing technologies such as LiDAR or depth cameras, along with detailed experimental records that encompass sensor parameters, pose annotations, and environmental variables.

Categories:: Artificial Intelligence

151 Views

Artificial Intelligence

Artificial Intelligence

Pages