Skip to main content

Dataset Search

Displaying 8209 - 8232 of 8237 results

The dataset stores a random sampling distribution with cardinality of support of 4,294,967,296 (i.e., two raised to the power of thirty-two). Specifically, the source generator is fixed as a symmetric-key cryptographic function with 64-bit input and 32-bit output. A total of 17,179,869,184 (i.e., two raised to the power of thirty-four) randomly chosen inputs are used to produce the sampling distribution as the dataset. The integer-valued sampling distribution is formatted as 4,294,967,296 (i.e., two raised to the power of thirty-two) entries, and each entry occupies one byte in storage.

Categories:
This data consists of 1000 studio-quality audios and their transcription for Vietnamese northern accent.  Each utterance has a length of 14-18 words and is spoken by a single speaker. The corpus can be used to create a Vietnamese speech synthesis system. A tutorial also available at https://vais.vn/vi/tai-ve/hts_for_vietnamese.
Categories:

The Annual Retail Trade Survey (ARTS) produces national estimates of total annual sales, e-commerce sales, end-of-year inventories, inventory-to-sales ratios, purchases, total operating expenses, inventories held outside the United States, gross margins, and end-of-year accounts receivable for retail businesses and annual sales and e-commerce sales for accommodation and food service firms located in the U.S.

License: U.S. Government Work

 

Categories:

HazeRD is an outdoor scene dataset for benchmarking dehazing algorithms. HazeRD contains 10 different scenes based on the architectural biometrics project. For each scene, the ground RGB images, depth maps, and synthesized hazy images following the atmospheric optics are provided; the hazy images come with five different haze level using real life physical parameters. The main features of HazeRD to other dehazing datasets are: HazeRD focuses on outdoor scenes whereas other datasets provide indoor scenes; and, the synthesis is based on real life parameters. 

Categories:

The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks.

Categories:

Category

A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed in [1].  Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California in [2, 3]. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., 1920x1080, 1280x720, 960x540, and 640x 360).

Categories:

The files found here are regularly-updated, complete copies of the OpenStreetMap.org database, and those published before the 12 September 2012 are distributed under a Creative Commons Attribution-ShareAlike 2.0 license, those published after are Open Data Commons Open Database License 1.0 licensed.

Categories:

This data set is about the measurement of the statistical electromagnetic field coupling to several shielded coaxial cables. The lines are aligned in parallel to a wall of a reverberation chamber. With a vector network analyzer, the coupled voltage between the inner conductor and the cable shield is measured for different stirrer positions over a wide frequency range. For comparison, the coupled current on the cable shield is calculated based on transmission line theory. From the ratio between the inner voltage and the shield current, a coupling resistance can be calculated.

Categories:

Category

This dataset is a result of my research production into machine learning in android security. The data was obtained by a process that consisted to map a binary vector of permissions used for each application analyzed {1=used, 0=no used}. Moreover, the samples of malware/benign were devided by "Type"; 1 malware and 0 non-malware.

When I did my research, the datasets of malware and benign Android applications were not available, then I give to the community a part of my research results for the future works.

Categories:

As part of the Obama Administration’s efforts to make our healthcare system more transparent, affordable, and accountable, the Centers for Medicare & Medicaid Services (CMS) has prepared a public data set, the Medicare Provider Utilization and Payment Data: Physician and Other Supplier Public Use File (Physician and Other Supplier PUF), with information on services and procedures provided to Medicare beneficiaries by physicians and other healthcare professionals.  The Physician and Other Supplier PUF contains information on utilization, payment (allowed amount and Medicare payment)

Categories:

At the intersection of signal processing and information forensics, the Signal Processing Cup 2016 global competition has explored a time-varying location-dependent signature of power grids that can be intrinsically captured in media recordings. This signature is called the Electric Network Frequency (ENF) signals. Throughout the SP Cup 2016 competition, participants were provided with multiple training, practice, and testing datasets that consisted of recordings made in different grids and containing ENF traces.

Categories:

Several established parameters and metrics have been used to characterize the acoustics of a room. The most important are the Direct-To-Reverberant Ratio (DRR), the Reverberation Time (T60) and the reflection coefficient. The acoustic characteristics of a room based on such parameters can be used to predict the quality and intelligibility of speech signals in that room.

Categories:

This data comes from the Climatological Database for the World's Oceans 1750-1850.  The data includes observational records of ship location, weather data, and other associated data. 

Categories:
The Pan American Health Organization / World Health Organization is publishing weekly counts of suspected and confirmed cases, by country and territory, as reported by each country. The data portal includes a few important notes: "The suspected cases in Brazil are unofficial (media monitoring)""Data is shared in an effort to transparently disseminate available information reported by Member States.
Categories:
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are:
  • To encourage research on algorithms that scale to commercial sizes
  • To provide a reference dataset for evaluating research
  • As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's)
  • To help new researchers get started in the MIR field

 

Categories:
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for the year 2014, which includes detailed information about causes of death and the demographic background of the deceased.   It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset.
Categories: