Multimodal Data | IEEE DataPort

Multimodal Object Detection dataset

VEDAI: The VEDAI dataset comprises 1246 high-resolution RGB and infrared images, containing 3640 objects categorized into 8 common vehicle classes. Each image, with a resolution of 1024 × 1024 pixels, spans diverse terrains and environments. Notably, vehicles occupy only a small portion of the image pixels, making small object detection particularly challenging.

Categories:

Geoscience and Remote Sensing

BSAM DATASETS

All multimodal recommendation datasets used in the manuscript Enhancing Robustness and Generalization Capability for Multimodal Recommender Systems via Sharpness-Aware Minimization (BSAM), which includes five Amazon datasets. Each dataset includes both visual and textual modalities. Baby, Sports, Clothing, Pet, and Office from Amazon. All the datasets comprise textual and visual features in the form of item descriptions and images. Our data preprocessing methodology follows the approach outlined in the MMRec Framework.

Categories:

Machine Learning

BSAM DATASETS

All multimodal recommendation datasets used in the manuscript Enhancing Robustness and Generalization Capability for Multimodal Recommender Systems via Sharpness-Aware Minimization (BSAM), which includes five Amazon datasets. Each dataset includes both visual and textual modalities. Baby, Sports, Clothing, Pet, and Office from Amazon. All the datasets comprise textual and visual features in the form of item descriptions and images. Our data preprocessing methodology follows the approach outlined in the MMRec Framework.

Categories:

Machine Learning

Chinese Industrial Parts Multimodal Dataset

This dataset comprises images of parts from real industrial scenarios and virtual reality environments. Real images are sourced from actual industrial settings, ensuring both authenticity and diversity, while virtual reality images, which make up approximately 11% of the dataset, are captured through precise 3D modeling. Approximately 30% of the part information was manually authored by industry experts, while the remaining 70% was generated by multimodal large models such as Wenxin Yiyan and GPT-4.

Categories:

Artificial Intelligence

e-FLASH

The increasing availability of multimodal data holds many promises for developments in millimeter-wave (mmWave) multiple-antenna systems by harnessing the potential for enhanced situational awareness. Specifically, inclusion of non-RF modalities to complement RF-only data in communications-related decisions like beam selection may speed up decision making in situations where an exhaustive search, spanning all candidate options, is required by the standard. However, to accelerate research in this topic, there is a need to collect real-world datasets in a principled manner.

Categories: