Machine Learning

As one of the research directions at OLIVES Lab @ Georgia Tech, we focus on the robustness of data-driven algorithms under diverse challenging conditions where trained models can possibly be depolyed.


This dataset includes all letters from Turkish Alphabet in two parts. In the first part, the dataset was categorized by letters, and the second part dataset was categorized by fonts. Both parts of dataset includes the features mentioned below.

  • 72, 20 AND 8 POINT LETTERS

The all characters in Turkish Alphabet are included (a, b, c, ç, d, e, f, g, ğ, h, ı, i, j, k, l, m, n, o, ö, p, r, s, ş, t, u, ü, v, y, z).


This dataset contains the actual sensor and calculated process variables in a winder station in a paper mill. Several Process variables change in time with the change of the rewind diameter. I provided the process data for two sets, in future I will add more data. Advanced time series forcasting techniques can be used to estimate many process variables considering the rewind diameter as the time axis.


Urban flooding is a common problem across the world. In India, it leads to casualties every year, and financial loss to the tune of tens of billions of rupees. The damage done due to flooding can be mitigated if the locations deserving attention are known. This will enable an effective emergency response, and provide enough information for the construction of appropriate storm water drains to mitigate the effect of floods. In this work, a new technique to detect flooding level is introduced, which requires no additional equipment, and consequent installation and maintenance costs.


Typically, a paper mill comprises three main stations: Paper machine, Winder station, and Wrapping station. The Paper machine produces paper with particular grammage in gsm (gram per square meter). The typical grammage classes in our paper mill are 48 gsm, 50 gsm, 58 gsm, 60 gsm, 68 gsm, 70 gsm. The Winder station takes a paper spool that is about 6 m width as it’s input and transfers is to customized paper rolls with particular diameter and width.


This folder contains two csv files and one .py file. One csv file contains NIST ground PV plant data imported from This csv file has 902 days raw data consisting PV plant POA irradiance, ambient temperature, Inverter DC current, DC voltage, AC current and AC voltage. Second csv file contains user created data. The Python file imports two csv files. The Python program executes four proposed corrupt data detection methods to detect corrupt data in NIST ground PV plant data.


Dataset contains ten days real-world DNS traffic  captured from campus network comprising of 4000 hosts in peak load hours. Dataset also contains labelled features.


A paradigm dataset is constantly required for any characterization framework. As far as we could possibly know, no paradigmdataset exists for manually written characters of Telugu Aksharaalu content in open space until now. Telugu content (Telugu: తెలుగు లిపి, romanized: Telugu lipi), an abugida from the Brahmic group of contents, is utilized to compose the Telugu language, a Dravidian language spoken in the India of Andhra Pradesh and Telangana just a few other neighboring states. The Telugu content is generally utilized for composing Sanskrit writings.


This is a new image-based handwritten historical digit dataset named ARDIS (Arkiv Digital Sweden). The images in ARDIS dataset are extracted from 15.000 Swedish church records which were written by different priests with various handwriting styles in the nineteenth and twentieth centuries. The constructed dataset consists of three single digit datasets and one digit strings dataset. The digit strings dataset includes 10.000 samples in Red-Green-Blue (RGB) color space, whereas, the other datasets contain 7.600 single digit images in different color spaces.


The date fruit dataset was created to address the requirements of many applications in the pre-harvesting and harvesting stages. The two most important applications are automatic harvesting and visual yield estimation. The dataset is divided into two subsets and each of them is oriented into one of these two applications. The first dataset consists of 8079 images of more than 350 date bunches captured from 29 date palms. The date bunches belong to five date types: Naboot Saif, Khalas, Barhi, Meneifi, and Sullaj.