Skip to main content

Datasets

Standard Dataset

Abstract

This dataset, denoted as š‘¾_data, represents a synthetic yet structurally authentic warehouse management dataset comprising 4,132 records and 11 well-defined attributes. It was generated using the Gretel.ai platform, following the structural standards provided by the TI Supply Chain API–Storage Locations specification. The dataset encapsulates essential operational and spatial parameters of warehouses, including unique identifiers, geospatial coordinates, storage capacities, and categorical capacity statuses. Each warehouse entry includes derived metrics such as squared coordinates and storage utilization rates, supporting comprehensive spatial and efficiency analyses. Designed to facilitate research in warehouse optimization, logistics planning, supply chain analytics, and resource allocation, this clean, well-balanced dataset reflects realistic attribute distributions and interdependencies. It is ideally suited for benchmarking algorithms in logistics optimization, classification, clustering, and spatial modeling tasks in industrial and academic applications.

Instructions:

šŸ“˜ Instructions for Use of Dataset and Analysis Tools

This dataset, titled Warehouse Capacity and Spatial Logistics Dataset ($\mathcal{W}_{\text{data}}$), provides a synthetic yet realistic representation of warehouse infrastructure and operational capacity across multiple spatial coordinates. It is designed to support research and development in the areas of warehouse logistics optimization, capacity utilization, supply chain management, spatial analytics, and intelligent resource allocation.

šŸ”§ Dataset Contents

šŸ“Š Suggested Tasks for Competitions and Research

Researchers and participants are encouraged to utilize the dataset for a wide variety of data-driven logistics and AI challenges, including but not limited to:

  1. Clustering & Classification
    • Segment warehouses by capacity status, geography, or utilization patterns
    • Predict Capacity Status using spatial and operational variables
  2. Outlier Detection
    • Identify anomalous facilities based on unusual capacity usage or spatial isolation
  3. Optimization Algorithms
    • Develop algorithms for optimizing storage distribution and facility deployment
  4. Geospatial Analysis
    • Explore spatial clustering or coverage using X and Y coordinates
  5. Feature Importance & Ranking
    • Compute feature relevance for warehouse classification or ranking

šŸ› ļø Recommended Analysis Tools

  • Python Libraries:
    • pandas, numpy – Data manipulation
    • matplotlib, seaborn, plotly – Visualization
    • scikit-learn – ML modeling and clustering
    • xgboost, lightgbm – Advanced classification/regression
    • geopandas, folium – Optional geospatial mapping
  • Notebooks/Environments:
    • Jupyter Notebook
    • Google Colab
    • VSCode + Python
  • šŸ“„ Getting Started

  • Download the CSV file and load it using Python (e.g., pd.read_csv("warehouse_dataset.csv"))
  • Check for structure & preview: df.head(), df.info()
  • Perform preprocessing if needed (note: the dataset is already clean and balanced)
  • Select a challenge such as classification, prediction, clustering, or logistics optimization
  • Document your methodology and results for reproducibility

āœ… Licensing and Ethics

This dataset is synthetic and does not contain personal or sensitive data. It is shared under a CC BY 4.0 license, and you are free to use it for academic, educational, and research purposes with appropriate citation.

 

More from this Author