This dataset was extracted from Twitter using keywords related to Dilma Roussef and Aécio Neves, that were the candidates of the second round of the 2014 presidential election in Brazil. This dataset contains texts in Portuguese and the respective classification of sentiments resulting from the techniques described in the article published in the 2018 IEEE International Conference on Data Mining Workshops - ICDMW ( 



The .zip file is divided into four .csv files with data organized in 11 columns named: date, amount of retweets, amount of favorites, tweet text, mentions, hashtags, id, permalink, a score of classification, label of sentiment.


The VND MO test benchmark problems




We build an original dataset of thermal videos and images that simulate illegal movements around the border and in protected areas and are designed for training machines and deep learning models. The videos are recorded in areas around the forest, at night, in different weather conditions – in the clear weather, in the rain, and in the fog, and with people in different body positions (upright, hunched) and movement speeds (regu- lar walking, running) at different ranges from the camera.



About 20 minutes of recorded material from the clear weather scenario, 13 minutes from the fog scenario, and about 15 minutes from rainy weather were processed. The longer videos were cut into sequences and from these sequences individual frames were extracted, resulting in 11,900 images for the clear weather, 4,905 images for the fog, and 7,030 images for the rainy weather scenarios.

A total of 6,111 frames were manual annotated so that could be used to train the supervised model for person detection. When selecting the frames, it was taken into account that the selected frames include different weather conditions so that in the set there were 2,663 frames shot in clear weather conditions, 1,135 frames of fog, and 2,313 frames of rain.

The annotations were made using the open-source Yolo BBox Annotation Tool that can simultaneously store annotations in the three most popular machine learning annotation formats YOLO, VOC, and MS COCO so all three annotation formats are available. The image annotation consists of a centroid position of the bounding box around each object of interest, size of the bounding box in terms of width and height, and corresponding class label (Human or Dog).



Invasive lobular carcinoma (ILC) is the second most prevalent histologic subtype of invasive breast cancer. Here, we comprehensively profiled 817 breast tumors, including 127 ILC, 490 ductal (IDC), and 88 mixed IDC/ILC. Besides E-cadherin loss, the best known ILC genetic hallmark, we identified mutations targeting PTEN, TBX3 and FOXA1 as ILC enriched features. PTEN loss associated with increased AKT phosphorylation, which was highest in ILC among all breast cancer subtypes. Spatially clustered FOXA1 mutations correlated with increased FOXA1 expression and activity.


This dataset contains the results of the simulation runs of the experiments performed to evaluate and compare the proposed spatial model for situated multi-agent systems. The model was introduced in a paper entitled "BioMASS, a spatial model for situated multiagent systems that optimizes neighborhood search". In this paper we presented a new model to implement a spatially explicit environment that supports constant-time sensory (neighborhood search) and locomotion functions for situated multiagent systems.


The dataset include a compressed file in zip format. It contains a directory structure as shown below. Each directory is a specific experiment with each simulation toolkit and parameters. Inside each directory there are 50 CSV files, one for echa simulation run. Each file has a header describing the main parameters of the corresponding experiment. We use the Repast Toolkit, and Mason Toolkit to perform a benchmark with the proposed BioMASS spatial model. 
















A Indústria enfrenta desafios graves e fracassa sem competitividade. Atacando esta problemática, conferiu-se o oferecimento de maior eficiência a processos industriais para promover a produtividade, elevar a qualidade e impulsionar mudanças. A solução desenvolvida incluiu dispositivos com sensores não invasivos, simples de instalar, que contabilizam os itens sendo transportados em linhas de produção.


Os dados foram coletados utilizando o dispositivo IoT da EnergyNow Tecnologias denominado Prodbox™, o qual opera como um equipamento empregado para intensificar a produtividade e apontar maneiras estratégicas de modificar variáveis que interferem na visão de gestão sobre a produção.

O dispositivo utiliza sensores não obstrutivos para contabilizar o número de itens que atravessam a linha de detecção gerada entre o transmissor e o receptor instalados.

Notadamente, os dados coletados são enviados para a nuvem, onde podem, quando integrados a uma plataforma de análise, ser processados para apresentar indicadores de acompanhamento de produtividade. Um sistema inteligente pode processar os dados coletados e apresentar métricas que permitem ao gestor identificar formas de aumentar a produção, bem como etapas que estão prejudicando a produtividade. Além disso, alertas customizados podem ser configurados para prover informação sobre a parada ou inatividade detectada pelo dispositivo.

Os dados gerados através do dispositivo podem ser utilizados para entender melhor variáveis sobre o ritmo de produção e, a partir delas, fomentar projeções de produção, calculando-se a relação entre itens produzidos e período de tempo necessário (segundos, minutos, horas, dias, semanas, etc).


Algumas sugestões sobre abordagens a serem consideradas:

  • Verifique se políticas de aumento de produtividade estão sendo efetivas.

  • Distribuia melhor os funcionários em etapas diferentes de uma linha de produção.

  • Correlacione etapas de produção com variáveis que estejam interferindo na produtividade para resolver problemáticas internas.


Imagine you just moved to your brand-new home and hired your energy provider. They tell you that based on the provided information they will set up a direct debit of €50/month. However, at the end of the year, that prediction was not quite accurate, and you end up paying a settlement amount of €300, or if you are lucky, they give you back some money. Either way, you will probably be disappointed with your energy provider and might consider moving on to another one. Predicting energy consumption is currently a key challenge for the energy industry as a whole.

Last Updated On: 
Tue, 12/01/2020 - 05:01

Demonstrating dataset used in one of the experiments.


Refer to:

Building Character Graphs and Dividing Communities in Chinese Novels Based on Graph Data Extraction: Community Division for Character Emotional Polarity Networks

script file below


Each voice sample is stored as a .WAV file, which is then pre-processed for acoustic analysis using the specan function from the WarbleR R package. Specan measures 22 acoustic parameters on acoustic signals for which the start and end times are provided.

The output from the pre-processed WAV files were saved into a CSV file, containing 3168 rows and 21 columns (20 columns for each feature and one label column for the classification of male or female).


Dataset asscociated with a paper in IEEE Transactions on Pattern Analysis and Machine Intelligence

"The perils and pitfalls of block design for EEG classification experiments"

DOI: 10.1109/TPAMI.2020.2973153

 If you use this code or data, please cite the above paper.


See the paper "The perils and pitfalls of block design for EEG classification experiments" on IEEE Xplore.

DOI: 10.1109/TPAMI.2020.2973153

Code for analyzing the dataset is included in the online supplementary materials for the paper.

The code and the appendix from the online supplementary materials are also included here.

If you use this code or data, please cite the above paper.