Skip to main content

Dataset Search

Displaying 8257 - 8280 of 8281 results

The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks.

Categories:

Category

A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed in [1].  Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California in [2, 3]. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., 1920x1080, 1280x720, 960x540, and 640x 360).

Categories:

The files found here are regularly-updated, complete copies of the OpenStreetMap.org database, and those published before the 12 September 2012 are distributed under a Creative Commons Attribution-ShareAlike 2.0 license, those published after are Open Data Commons Open Database License 1.0 licensed.

Categories:

This data set is about the measurement of the statistical electromagnetic field coupling to several shielded coaxial cables. The lines are aligned in parallel to a wall of a reverberation chamber. With a vector network analyzer, the coupled voltage between the inner conductor and the cable shield is measured for different stirrer positions over a wide frequency range. For comparison, the coupled current on the cable shield is calculated based on transmission line theory. From the ratio between the inner voltage and the shield current, a coupling resistance can be calculated.

Categories:

Category

This dataset is a result of my research production into machine learning in android security. The data was obtained by a process that consisted to map a binary vector of permissions used for each application analyzed {1=used, 0=no used}. Moreover, the samples of malware/benign were devided by "Type"; 1 malware and 0 non-malware.

When I did my research, the datasets of malware and benign Android applications were not available, then I give to the community a part of my research results for the future works.

Categories:

As part of the Obama Administration’s efforts to make our healthcare system more transparent, affordable, and accountable, the Centers for Medicare & Medicaid Services (CMS) has prepared a public data set, the Medicare Provider Utilization and Payment Data: Physician and Other Supplier Public Use File (Physician and Other Supplier PUF), with information on services and procedures provided to Medicare beneficiaries by physicians and other healthcare professionals.  The Physician and Other Supplier PUF contains information on utilization, payment (allowed amount and Medicare payment)

Categories:

At the intersection of signal processing and information forensics, the Signal Processing Cup 2016 global competition has explored a time-varying location-dependent signature of power grids that can be intrinsically captured in media recordings. This signature is called the Electric Network Frequency (ENF) signals. Throughout the SP Cup 2016 competition, participants were provided with multiple training, practice, and testing datasets that consisted of recordings made in different grids and containing ENF traces.

Categories:

Several established parameters and metrics have been used to characterize the acoustics of a room. The most important are the Direct-To-Reverberant Ratio (DRR), the Reverberation Time (T60) and the reflection coefficient. The acoustic characteristics of a room based on such parameters can be used to predict the quality and intelligibility of speech signals in that room.

Categories:

This data comes from the Climatological Database for the World's Oceans 1750-1850.  The data includes observational records of ship location, weather data, and other associated data. 

Categories:
The Pan American Health Organization / World Health Organization is publishing weekly counts of suspected and confirmed cases, by country and territory, as reported by each country. The data portal includes a few important notes: "The suspected cases in Brazil are unofficial (media monitoring)""Data is shared in an effort to transparently disseminate available information reported by Member States.
Categories:
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are:
  • To encourage research on algorithms that scale to commercial sizes
  • To provide a reference dataset for evaluating research
  • As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's)
  • To help new researchers get started in the MIR field

 

Categories:
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for the year 2014, which includes detailed information about causes of death and the demographic background of the deceased.   It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset.
Categories:

Drosophila Melanogaster, the common fruit fly, is a model organism which has been extensively used in entymological research. It is one of the most studied organisms in biological research, particularly in genetics and developmental biology.

When its not being used for scientific research, D. melanogaster is a common pest in homes, restaurants, and anywhere else that serves food. They are not to be confused with Tephritidae flys (also known as fruit flys).

https://en.wikipedia.org/wiki/Drosophila_melanogaster

Categories:

Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

Categories:

On February 11th 2016 LIGO-Virgo collaboration gave the announce of the discovery of Gravitational Waves, just 100 years after the Einstein’s paper on their prediction. The LIGO Scientific Collaboration (LSC) and the Virgo Collaboration prepared a web page to inform the broader community about a confirmed astrophysical event observed by the gravitational-wave detectors, and to make the data around that time available for others to analyze

Categories:

The TMC maintains a map of traffic speed detectors throughout the City. The speed detector themselves belong to various city and state agencies. The Traffic Speeds Map is available on the DOT's website. This data feed contains 'real-time' traffic information from locations where DOT picks up sensor feeds within the five boroughs, mostly on major arterials and highways. DOT uses this information for emergency response and management.

The metadata defines the fields available in this data feed and explains more about the data.

Categories: