A database of lips traces
Cheiloscopy is a forensic investigation technique that deals with identification of humans based on lips traces. Lip prints are unique and permanent for each individual, and next to the fingerprinting, dental identification, and DNA analysis can be one of the basis for criminal/forensics analysis.


SUT-Lips-DB database is free for scientific and testing purposes. However, you are asked to cite the data set and our papers mentioned at Home Project web site every time when you publish your own research conducted with the use of our data set or when you compare your own results with ours.

The main ZIP archive contains several folders. Each folder may contain several lip traces as JPG files only for one person. Data are anonimized. The name of the folder contains the informormation on the gender of the person. Additional CSV file contains information about year of birth of people for who we collected samples.


This dataset contains citation dynamics of individual papers published in several journals including ACM, Cell, IEEE, Nature, Science, NEJM, PNAS, Physical Review (PR), PRL. Each txt file contains citation dynamics (up to 2014) of papers published in a particular journal in a particular year. For example, ieee1985.txt contains citation dynamics of papers published in IEEE in 1985. Note that the citation counts of year 2014 are incomplete as this dataset was collected in summer 2014.


The aim of the database consists of providing the researchers with a collection of power quality real-life impulsive events to test experiments and measurement instruments. The dataset provides signals recordings from the power network of the University of Cádiz during the last five years (electrical network according to the UNE-EN-50160: 2011).

The dataset offers a diversity of real impulsive events, which are specifically acquired in order to test Power Quality Instruments according to the UNE-IEC 61000-4-11: 2005.


Costas arrays are permutation matrices that meet the added Costas condition that, when used as a frequency-hop scheme, allow at most one time-and-frequency-offset signal bin to overlap another.  Databases to various orders have been available for many years.  Here we have a database that is far more extensive than any available before it.  A very powerful and easy-to-use Windows utility with a GUI accompanies the database.


Download the file GetStarted.zip.  This file contains the Instructions as a PDF file, the extraction and analysis utility in its own ZIP file, and several information files includign an enumeration database in an Excel file.


Unpack this file in a folder that you want to be the location of your Costas array database.  Be sure and unpack subfolders, so that you dee subfolders /Searches and /Generated when you are done.  Folder /Searches contains all Costas arrays to order 29, and folder /Generated contains all generated Costas arrays to order 100.  The file Read_CA_Database_00.zip contains the extraction and analysis utility.  It may be extracted in-place or, if the database is on a network drive or other location inconvenient for DLLs, in its own folder anywhere on a local drive such as your C:\ drive.  See the Instructions PDF for details.


Then, as you need them, add these files: CA_Database_101-200.zip        More data for /Generated folder CA_Database_201-300.zip        More data for /Generated folder CA_Database_301-400.zip        More data for /Generated folder CA_Database_401-500.zip        More data for /Generated folder CA_Database_501-600.zip        More data for /Generated folder CA_Database_601-700.zip        More data for /Generated folder CA_Database_701-800.zip        More data for /Generated folder CA_Database_801-900.zip        More data for /Generated folder CA_Database_901-950.zip       More data for /Generated folder

CA_Database_951-1000.zip    More data for /Generated folder CA_Database_1001-1030.zip    More data for /Generated folder


This is a file that was produced by the extraction/analysis utility FrHop_LUB_Database.zip        Frequency hop LUB list; useful with PLL-based waveform generators


For further information, see the file Costas Arrays to Order 1030 INSTRUCTIONS.pdf


IEEE Big Data is proud to announce the next competition in our series of Data Analytics & Visualization Competitions. It will be held at IEEE COMPSAC 2017, 4-8 July 2017 in Torino, Italy, and is open to all attendees of the conference.

Last Updated On: 
Tue, 08/08/2017 - 10:51
Citation Author(s): 
United Nations Statistics Division, International Telecommunications Union, United States Patent and Trademark Office

The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks.


We also provide the page graph in the format expected by the WebGraph Framework developed by Sebastiano Vigna. The graph is represented using three files: .graph, .offsets, .properties. All three are necessary to load the network into the library.

Using the WebGraph Framework, which can be downloaded from Maven Central, these files can be loaded using the following line of code: BVGraph graph = BVGraph.loadMapped(baseName, new ProgressLogger()).

The extracted data is provided according the same terms of use, disclaimer of warranties and limitation of liabilities that apply to the Common Crawl corpus.

The Web Data Commons extraction framework can be used under the terms of the Apache Software License.


This data set is about the measurement of the statistical electromagnetic field coupling to several shielded coaxial cables. The lines are aligned in parallel to a wall of a reverberation chamber. With a vector network analyzer, the coupled voltage between the inner conductor and the cable shield is measured for different stirrer positions over a wide frequency range. For comparison, the coupled current on the cable shield is calculated based on transmission line theory. From the ratio between the inner voltage and the shield current, a coupling resistance can be calculated.


The raw data of the measurement is saved in text files.

Each subfolder is for a different measurement with a different cable or for the coupling between the antennas.

The meaning of the examplary filename "alpha_0.0_s11_imag.out" is as follows:
- "alpha_0.0" stands for a stirrer angle of 0°
- "s_11" means the scattering parameter s_11 (there are also s_12, s_21 and s_22)
- "imag" is for the imaginary part
- "real" is for the real part

The content of each file is the scattering parameter, where each line stands for a different frequency.

The file "alphas.out" contains all the stirrer angles (in °).

The file "freqs.out" contains all the frequencies (in Hz).

If you encounter any problems or if you have any questions, please contact:

Mathias Magdowski
Chair for Electromagnetic Compatibility
Institute for Medical Engineering
Otto von Guericke University Magdeburg, Germany
mathias.magdowski@ovgu.de or mathias.magdowski@kabelmail.de
Tel. +49-391-67-52195


This dataset is a result of my research production into machine learning in android security. The data was obtained by a process that consisted to map a binary vector of permissions used for each application analyzed {1=used, 0=no used}. Moreover, the samples of malware/benign were devided by "Type"; 1 malware and 0 non-malware.

When I did my research, the datasets of malware and benign Android applications were not available, then I give to the community a part of my research results for the future works.


The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are:

  • To encourage research on algorithms that scale to commercial sizes
  • To provide a reference dataset for evaluating research
  • As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's)
  • To help new researchers get started in the MIR field



Drosophila Melanogaster, the common fruit fly, is a model organism which has been extensively used in entymological research. It is one of the most studied organisms in biological research, particularly in genetics and developmental biology.

When its not being used for scientific research, D. melanogaster is a common pest in homes, restaurants, and anywhere else that serves food. They are not to be confused with Tephritidae flys (also known as fruit flys).