Computational Intelligence

Includes sentiment-specific distributed word representations that have been trained on 10M Arabic tweets that are distantly supervised using positive and negative keywords. As described in the paper [1], we follow Tang’s [2] three neural architectures, which encode the sentiment of a word in addition to its semantic and syntactic representation. 


Specifications Table

Subject area

 Natural Language Processing


The dataset contains measurements taken from four air handling units (AHU) installed in a medium-to-large size academic building. The building is a 7-story, 9000 sqm facility commissioned in 2016 hosting the PRECIS research center. It contains multiple research laboratories, multifunction spaces, meeting rooms, and a large auditorium as well as administrative offices. It is located at 44°2606.0N and 26°0244.0E in a temperate continental climate with hot summers and cold winters. Cooling is handled using on-site electric chillers while heating is provided from a district heating network.


The presented dataset has been used as a basis for CAO - a system for analysis of emoticons in Japanese online communication, developed by Ptaszynski et al. (2010). Emoticons are strings of symbols widely used in text-based online communication to convey user emotions. The database contains: 1) a predetermined raw emoticon database containing over ten thousand emoticon samples extracted from the Web, 2) emoticon parts automatically divided from raw emoticons into semantic areas representing “mouths” or “eyes”.


Our goal is to find whether a convolutional neural network (CNN) performs better than the existing blind algorithms for image denoising, and, if yes, whether the noise statistics has an effect on the performance gap. We performed automatic identification of noise distribution, over a set of nine possible distributions, namely, Gaussian, log-normal, uniform, exponential, Poisson, salt and pepper, Rayleigh, speckle and Erlang. Next, for each of these noisy image sets, we compared the performance of FFDNet, a CNN based denoising method, with noise clinic, a blind denoising algorithm.


We introduced the task of acoustic question answering (AQA) in

A second version of the dataset was introduced in

This dataset aim to promote research in the acoustic reasoning area.

It comprise Acoustic Scenes and multiple questions/answers for each of them.


Date fruit data sets are not publicly available. Previous studies have collected and used their own data set. Almost all these studies have few hundred images per class. As our motive was robust date fruit classification, we did not use the camera to take images of a particular size, angle or images with a particular background, instead to add robustness, we built our date fruit database using Google search engine. Hence the images had the multi-background, noise, different lighting condition, other objects, different packaging and sometimes even partial covering.


Our Signing in the Wild dataset consists of various videos harvested from YouTube containing people signing in various sign languages and doing so in diverse settings, environments, under complex signer and camera motion, and even group signing. This dataset is intended to be used for sign language detection.



The database was created with records of psychosocial risk level colombian teachers school using physiological variables from May 2016 to December 2017 in five municipalities of a metropolitan area of city in Colombia. The application of physiological variables was made to the people who voluntarily participated in the study. The names and personal data were kept by the researcher.


The date fruit dataset was created to address the requirements of many applications in the pre-harvesting and harvesting stages. The two most important applications are automatic harvesting and visual yield estimation. The dataset is divided into two subsets and each of them is oriented into one of these two applications. The first dataset consists of 8079 images of more than 350 date bunches captured from 29 date palms. The date bunches belong to five date types: Naboot Saif, Khalas, Barhi, Meneifi, and Sullaj.


The dataset contains Software Development Effort Estimation (SDEE) metrics values extracted from around 1800 Open Source Software (OSS) repositories of GitHub.