Dataset

Here i got parsed literature site https://avidreaders.ru for about 10.000.000 sentences from russian books and make sentence vector embeddings from them using Mistral open API.

Embeddings got resized from 1024 to 256 dimensions using python scikit-learn PCA method.

Word embeddings are a way of representing words as vectors in a multi-dimensional space, where the distance and direction between vectors reflect the similarity and relationships among the corresponding words.

Categories:
62 Views

Recently, combinatorial interaction strategies have a large spectrum as black box strategies for testing software and hardware. This paper discusses a novel adoption of a combinatorial interaction strategy to generate a sparse combinatorial data table (SCDT) for machine learning. Unlike test data generation strategies, in which the t-way tuples synthesize into a test case, the proposed SCDT requires analyzing instances against their corresponding tuples to generate a systematic learning dataset.

Categories:
115 Views

We introduce two novel datasets for cell motility and wound healing research: the Wound Healing Assay Dataset (WHAD) and the Cell Adhesion and Motility Assay Dataset (CAMAD). WHAD comprises time-lapse phase-contrast images of wound healing assays using genetically modified MCF10A and MCF7 cells, while CAMAD includes MDA-MB-231 and RAW264.7 cells cultured on various substrates. These datasets offer diverse experimental conditions, comprehensive annotations, and high-quality imaging data, addressing gaps in existing resources.

Categories:
726 Views

 The burgeoning demand for collaborative robotic systems to execute complex tasks collectively has intensified the research community's focus on advancing simultaneous localization and mapping (SLAM) in a cooperative context. Despite this interest, the scalability and diversity of existing datasets for collaborative trajectories remain limited, especially in scenarios with constrained perspectives where the generalization capabilities of Collaborative SLAM (C-SLAM) are critical for the feasibility of multi-agent missions.

Categories:
320 Views

This dataset presents real-world IoT device traffic captured under a scenario termed "Active," reflecting typical usage patterns encountered by everyday users. Our methodology emphasizes the collection of authentic data, employing rigorous testing and system evaluations to ensure fidelity to real-world conditions while minimizing noise and irrelevant capture.

Categories:
538 Views

The dataset encompasses a diverse array of electrical signals representing Power Quality Disturbances (PQD), both in single and combined forms, meticulously generated in adherence to the IEEE 1159 guideline.  Crucially, the dataset includes both raw data and corresponding labels, facilitating supervised learning tasks and enabling the development and evaluation of classification algorithms.

Categories:
1104 Views

China has experienced a rapid urbanization over the past three decades, resulting in a prominent “urban core-suburban-rural” (USR) triad structure of human settlements. The USR disparities, which are related to the spatial variations of human activity intensity, have significant impacts on the spatiotemporal variations in various environmental issues such as carbon dioxide (CO2) emissions, carbon storage, water quality, etc. However, there is a lack of national-level, long-term USR dataset compared to the large number of “Urban-Rural” dual structure datasets.

Categories:
229 Views

The increasing complexity of cellular networks has resulted in dynamic network performance optimization (NPO) playing a critical role in streamlining network operations. While the success of NPO techniques primarily depends upon the quality and quantity of telemetry data available from the underlying network, up until now, third-party access to such data has been largely limited due to the prevalence of proprietary interfaces throughout the access network. However, the upcoming open radio access network (RAN) architecture is set to change this trend.

Categories:
279 Views

Pages