Dataset
Here i got parsed literature site https://avidreaders.ru for about 10.000.000 sentences from russian books and make sentence vector embeddings from them using Mistral open API.
Embeddings got resized from 1024 to 256 dimensions using python scikit-learn PCA method.
Word embeddings are a way of representing words as vectors in a multi-dimensional space, where the distance and direction between vectors reflect the similarity and relationships among the corresponding words.
- Categories:
To download this dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13896353
Please cite the following paper when using this dataset:
- Categories:
NIMS BENIGN DATASET 2024-2 dataset comprises data captured from Consumer IoT devices, depicting three primary real-life states (Power-up, Idle, and Active) experienced by everyday users. Our setup focuses on capturing realistic data through these states, providing a comprehensive understanding of Consumer IoT devices.
The dataset comprises of nine popular IoT devices namely
Amcrest Camera
Smarter Coffeemaker
Ring Doorbell
Amazon Echodot
Google Nestcam
Google Nestmini
Kasa Powerstrip
Samsung 32 inch Smart Television (TV)
- Categories:
To download the dataset without purchasing an IEEE Dataport subscription, please visit: https://zenodo.org/records/13738598
Please cite the following paper when using this dataset:
- Categories:
Recently, combinatorial interaction strategies have a large spectrum as black box strategies for testing software and hardware. This paper discusses a novel adoption of a combinatorial interaction strategy to generate a sparse combinatorial data table (SCDT) for machine learning. Unlike test data generation strategies, in which the t-way tuples synthesize into a test case, the proposed SCDT requires analyzing instances against their corresponding tuples to generate a systematic learning dataset.
- Categories:
We introduce two novel datasets for cell motility and wound healing research: the Wound Healing Assay Dataset (WHAD) and the Cell Adhesion and Motility Assay Dataset (CAMAD). WHAD comprises time-lapse phase-contrast images of wound healing assays using genetically modified MCF10A and MCF7 cells, while CAMAD includes MDA-MB-231 and RAW264.7 cells cultured on various substrates. These datasets offer diverse experimental conditions, comprehensive annotations, and high-quality imaging data, addressing gaps in existing resources.
- Categories:
The burgeoning demand for collaborative robotic systems to execute complex tasks collectively has intensified the research community's focus on advancing simultaneous localization and mapping (SLAM) in a cooperative context. Despite this interest, the scalability and diversity of existing datasets for collaborative trajectories remain limited, especially in scenarios with constrained perspectives where the generalization capabilities of Collaborative SLAM (C-SLAM) are critical for the feasibility of multi-agent missions.
- Categories:
This dataset presents real-world IoT device traffic captured under a scenario termed "Active," reflecting typical usage patterns encountered by everyday users. Our methodology emphasizes the collection of authentic data, employing rigorous testing and system evaluations to ensure fidelity to real-world conditions while minimizing noise and irrelevant capture.
- Categories:
The dataset encompasses a diverse array of electrical signals representing Power Quality Disturbances (PQD), both in single and combined forms, meticulously generated in adherence to the IEEE 1159 guideline. Crucially, the dataset includes both raw data and corresponding labels, facilitating supervised learning tasks and enabling the development and evaluation of classification algorithms.
- Categories:
China has experienced a rapid urbanization over the past three decades, resulting in a prominent “urban core-suburban-rural” (USR) triad structure of human settlements. The USR disparities, which are related to the spatial variations of human activity intensity, have significant impacts on the spatiotemporal variations in various environmental issues such as carbon dioxide (CO2) emissions, carbon storage, water quality, etc. However, there is a lack of national-level, long-term USR dataset compared to the large number of “Urban-Rural” dual structure datasets.
- Categories: