This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 100 million running words.

Word2Vec model details: Embeddings Dimension: 300, Architecture: Continuous - BOW, Training algorithm: Negative sampling = 15, Context (window) size: 10, Token minimum count: 2, Encoded in UTF-8.

  • Computational Intelligence
  • Last Updated On: 
    Sun, 09/15/2019 - 14:49

    Geomagnetic field variations produce geoelectric fields that can affect the operation of technological networks at the Earth’s surface, including power systems, pipelines, phone cables and railway circuits. To assess the geomagnetic hazard to this technology, it is necessary to model the geomagnetically induced currents (GIC) produced in these systems during geomagnetic disturbances. This requires use of geomagnetic data with appropriate Earth conductivity models to calculate the geoelectric fields that drive GIC.

  • Power and Energy
  • Last Updated On: 
    Sun, 09/01/2019 - 19:56

    Car-hailing order data are a rich source to study the human mobility patterns, which could contribute to transporation planning and policy-making. In general, a orginal car-hailing order record includes information such as origin, destination, pick-up time, drop-off time, and travel distance. Beijing car-hailing order dataset stored the discretized order data at a traffic analysis zone(TAZ) scale, including the dataset for training and test. 


  • Transportation
  • Last Updated On: 
    Tue, 08/27/2019 - 22:22

    In this data set, for different ranges: 1000, 10000, 100000, the input data is divided into three txt files.

    Each specific txt document contains 500 sample data, divided into 5 categories on average, which are represented as N=5,12,18,23,30.

  • Security
  • Last Updated On: 
    Sun, 08/04/2019 - 09:32

    The dataset contains waveform and label. The label is set as: 0: Subcycle incipient fault 1: Multicycle incipient fault 2: Permanent fault 3: Transient disturbance. Subtype label and its explanation are also given. 

  • Electric Utility
  • Last Updated On: 
    Sun, 06/30/2019 - 08:41

    This data set is composed by samples of load signature of electric devices acquired on a non-intrusively form. The test-bench was performed using four identical fluorescent lamps, four identical slots and four identical switches. Identical term means the same technical specifications (nominal voltage, power, isolation voltage, among others). The sensors are connected to the power supply in order to measure the electrical variations when appliances are turned on/off. We have 16 possible network configurations with 4 appliances, in which one, two, three or four appliances can be turned on.

  • Smart Grid
  • Last Updated On: 
    Wed, 06/19/2019 - 15:21

    This dataset contains the library call lists obtained from programs implemented by using libiec61850. Call lists are marked either as benign, or according to the name of the attack.

  • Smart Grid
  • Last Updated On: 
    Mon, 06/17/2019 - 14:13

    These files are the dataset of the antenna simulation and measurement.All the simulation data were obtained using FEKO, and those were imported and visualized using MATLAB.The scattering parameters of the antenna were measured using Keysight E8362B vector network analyzer, while the gain patterns were measured in the anechoic chamber.

  • Other
  • Last Updated On: 
    Sun, 06/02/2019 - 21:44