This dataset contains a sequence of network events extracted from a commercial network monitoring platform, Spectrum, by CA. These events, which are categorized by their severity, cover a wide range of events, from a link state change up to critical usages of CPU by certain devices. Regarding the layers they cover, they are focused on the physical, network and application layer. As such, the whole set gives a complete overview of the network’s general state.


The dataset is composed by a single plain text file in csv format.  This csv we contains the following variables:

• Severity: the importance of the event. It is divided in four different levels: Blank, Minor, Major and Critical.

• Created On: the date and time when the event was created.Theschemeis"month/day/year hour:minute:second".

• Name: (anonymized) name of the device the event happened on.

• EventType: hexadecimal code detailing the category the event pertains to.

• Event: message associated with the event.


Thus, a certain event will be a combination of an event type on a certain device on a certain time, it will be described by its severity and explained by the event message.


This dataset details the state machine based experiments of PowerWatch.


PowerWatch Experiment Summaries


This dataset summarizes the experiments done for the PowerWatch paper. The accompanying code will be

shared after the paper is published.


There are 2 files:

    * expres.csv: Each entry in this file represents the summary with respect to a unique state machine, represented by

        the field "state_machine_id".

    * runres.csv: For each state machine, a total of 45 runs are conducted, each individual runs are represented by

        an entry.


The fields in "expres.csv" are explained as follows.

    * state_machine_id: A number uniquely identifies an experiment. The ID was also used as a random seed.

        The naming here is, unfortunately, confusing.

    * bucket_size: Chosen bucket size.

    * window_size: Chosen window size.


    Next 12 fields represent the "complexity" of the machine with respect to call lists they emit.

    In each experiment, two machines were run: benign and malicious. The difference between those are that, in the

    malicious machine, there is one more state emitting an unique call list.


    * cumulative_call_size_benign: Sum of the number of call lists emitted by benign states.

    * mean_call_size_benign: Mean of the call lists emitted by benign states.

    * variance_call_size_benign: Variance of the call lists emitted by benign states.

    * malicious_state_call_size: Number of calls emitted by the malicious state.

    * malicious_state_vocabulary_size: Number of different calls emitted by the malicious state.

    * cumulative_edit_distance_every_state: The edit distance between every state. Represents

        how the individual computing states vary from each other.

    * mean_edit_distance_every_state: Mean of the edit distance computed between every state.

    * variance_of_edit_distance_every_state: Variance of the edit distances computed between every state.

    * cumulative_edit_distance_good_bad: Total edit distance computed between every benign state and the malicious state.

    * mean_edit_distance_good_bad: Mean edit distance computed between every benign state and the maliicous state.

    * min_edit_distance_good_bad: Minimum of edit distances computed between every benign state and the maliicous state.

    * variance_edit_distance_good_bad: Variance of edit distances computed between every benign state and the maliicous state.


    * training_time: Total time required for training the machine learning model.

    * prediction_time: Total time required for prediction stage.


    * svm_accuracy: Accuracy of a SVM model taking inputs of maximum activity signal per run.

    * svm_margin: Unused.


    * mean_benign_train_activity_index: Mean activity index, calculated on the training set.

    * mean_benign_test_activity_index: Mean activity index, calculated on the data obtained from the benign machine, but not

        used for training.

    * mean_malicious_activity_index: Mean activity index, calculated on the data obtained from the malicious machine.



Originally, a cascade of max-pooling and convolution mechanism were considered, but we later decided to use a single

convolution step after the prediction stage. The naming of the fields are made with respect to the initial algorithm,

and a little misleading, explained below where necessary:

    * state_machine_id: The ID of the associated experiment.

    * run_number: The number of the run.

    * malicious: If the run contained the malicious state.

    * trained_on: If the resulting data used in training.


    Remember that the first convolution yields the activity signal. Individual points in the activity signal are

    the activity index. Statistics about the activity signal is given in the following fields:


    * min_of_first_convolution: Minimum value of the first convolution. This is the minimum activity index in the activity signal.

    * max_of_first_convolution: Maximum value of the first convolution. This is the minimum activity index in the activity signal.

    * mean_of_first_convolution: Mean value of the first convolution. This is the minimum activity index in the activity signal.

    * variance_of_first_convolution: Variance value of the first convolution. This is the minimum activity index in the activity signal.


    * prediction_time: Time required to predict data generated in this run.

    * reduction_time: Time required during the convolution stage.

    * sp_accuracy: Accuracy of the predictor (predicting the next call).

    * sp_misclassification: 1 - sp_accuracy.


    * activity_index: This value was calculated WRT initial model, and completely useless in the final model. Disregard.



 Measurements collected from R1 for root cause analyses of the network service states defined from quality and service design perspectives


The data through Figure 1~3 in the manuscript "Spatio-Temporal Correlation Analysis of Online Monitoring Data for Anomaly Detection and Location in Distribution Networks".


Accurate short-term load forecasting (STLF) plays an increasingly important role in reliable and economical power system operations. This dataset contains The University of Texas at Dallas (UTD) campus load data with 13 buildings, together with 20 weather and calendar features. The dataset spans from 01/01/2014 to 12/31/2015 with an hourly resolution. The dataset is beneficial to various research such as STLF.


In an aging population, the demand for nurse workers increases to care for elders. Helping nurse workers make their work more efficient, will help increase elders quality of life, as the nurses can focus their efforts on care activities instead of other activities such as documentation.
Activity Recognition can be used for this goal. If we can recognize what activity a nurse is engaged in, we can partially automate documentation process to reduce time spent on this task, monitor care plan compliance to assure that all care activities have been done for each elder, among others.

Last Updated On: 
Fri, 12/06/2019 - 03:40

To obtain the prices of parts from the manufacturing characteristics and other manufacturing processes, feature quantity expression is innovatively applied. By identifying manufacturing features and calculating the feature quantities, the feature quantities are described in the form of assignments as data. To obtain the prices of parts intelligently, the most widely used and mature deep-learning method is adopted to realize the accurate quotation of parts


This dataset used in the experiment of paper "Bus Ridesharing Scheduling Problem". This is a real-world bus ridesharing scheduling problem of Chengdu city in China, which includes 10 depots, 2,000 trips.


This is the dataset used in the experiment of paper "Bus Pooling: A Large-Scale Bus Ridesharing Service". The dataset contains 60,822,634 trajectory data from 11,922 Shanghai taxis from one day (Apr 1, 2018). The 100 groups of coordinate sets containing three coordinates as experimental samples are used to compare the effectiveness and efficiency of location-allocation algorithms.


This dataset refers to the case study performed in the paper "A Real Options Market-Based Approach to Increase Penetration of Renewables", submitted to IEEE Transactions on Smart Grid. The file contains the Midcontinent ISO data used for the day-ahead prices, as well as the wind data from NREL's Wind Integration National Dataset Toolkit which was used to estimate the renewable productions in the case study.