Skip to main content

Artificial Intelligence

This study explores the relationship between social media sentiment and stock market movements using a dataset of tweets related to various publicly traded companies. The dataset comprises time-stamped tweets containing company-specific information, stock ticker symbols, and company names. By leveraging natural language processing (NLP) techniques, we analyze the sentiment of tweets to determine their impact on stock price fluctuations. This research aims to develop predictive models that incorporate tweet sentiment and frequency as features to forecast stock price movements.

Categories:

The shift towards cloud-native applications has been accelerating in recent years. Modern applications are increasingly distributed, taking advantage of cloud-native features such as scalability, flexibility, and high availability. However, this evolution also introduces various security challenges. From a networking perspective, the large number of interconnected components and their intricate communication patterns make detecting and mitigating traffic anomalies a complex task.

Categories:

Ensemble clustering, which integrates multiple base clusterings to enhance robustness and accuracy, is commonly evaluated on over 10 benchmark datasets. These include 4 synthetic datasets (e.g., 3MC,atom,Tetra and Flame) designed to test algorithms on nonlinear separability and density variations.

Categories:

SNMDat2.0 is a comprehensive multimodal dataset, expanded from the unimodal TwiBot-20, designed for Twitter social bot detection. Specifically, we add 274587 profile images and profile background images, 86498 tweet images and 49549 tweet videos based on the original 229580 twitter users, 227979 follow relationships and 33488192 tweet text.

Categories:

This dataset contains 609,934 real Modbus TCP packets collected from industrial control system (ICS) environments, capturing the full byte-level structure of Modbus communication, including MBAP headers and function-specific payloads. Designed to support research in industrial cybersecurity, this dataset addresses the scarcity of diverse and realistic Modbus traffic, which often hampers the development of intrusion detection systems (IDS) and protocol-compliant synthetic data generators.

Categories:

We released TrafficLLM's training datasets, which contain over 0.4M traffic data and 9K human instructions for LLM adaptation across different traffic analysis tasks.

Categories:

The painting style data sets were constructed by searching, selecting and collecting the public painting works on the internet, treating the painting style and artists' names as keywords. The data set collected 750 painting works in all, including five kinds of styles. They were receptively Cubism, Op Art, Color Field Painting, Post Impressionism and Rococo.

Categories:

Amid global climate change, rising atmospheric methane (CH4) concentrations significantly influence the climate system, contributing to temperature increases and atmospheric chemistry changes. Accurate monitoring of these concentrations is essential to support global methane emission reduction goals, such as those outlined in the Global Methane Pledge targeting a 30% reduction by 2030. Satellite remote sensing, offering high precision and extensive spatial coverage, has become a critical tool for measuring large-scale atmospheric methane concentrations.

Categories:

Computational experiments within metaverse service ecosystems enable the identification of social risks and governance crises, and the optimization of governance strategies through counterfactual inference to dynamically guide real-world service ecosystem operations. The advent of Large Language Models (LLMs) has empowered LLM-based agents to function as autonomous service entities capable of executing diverse service operations within metaverse ecosystems, thereby facilitating the governance of metaverse service ecosystem with computational experiments.

Categories:

Laboratory experiments are fundamental to science education, yet resource constraints often limit students’ access to hands-on learning experiences. While object detection technology offers promising solutions for automated material identification and assistance, existing datasets like CABD (21 classes) and Chemical Experiment Image Dataset (7 classes) are limited in scope. We present two comprehensive datasets for laboratory material detection: a Chemistry dataset comprising 1,191 images across 60 classes and a Physics dataset containing 1,749 images across 76 classes.

Categories: