First Name: 
Last Name: 

Datasets & Analysis

Recent US Census Data the American Community Survey,


The Annual Retail Trade Survey (ARTS) produces national estimates of total annual sales, e-commerce sales, end-of-year inventories, inventory-to-sales ratios, purchases, total operating expenses, inventories held outside the United States, gross margins, and end-of-year accounts receivable for retail businesses and annual sales and e-commerce sales for accommodation and food service firms located in the U.S.

License: U.S. Government Work



The 2014 Annual Retail Trade Report was released on March 7, 2016. A Summary of Changes provides comparability with previous surveys.

  • Annual Retail Trade Survey—2014:
  • Sales (1992-2014): Excel [66KB]
  • Sales Taxes (2004-2014): Excel [46KB]
  • Inventories (1992-2014): Excel [44KB]
  • Purchases (1992-2014): Excel [46KB]
  • Total Operating Expenses (2006-2014): Excel [47KB]
  • Gross Margin (1993-2014): Excel [46KB]
  • Gross Margin as a Percentage of Sales (1993-2014): Excel [47KB]
  • Accounts Receivable (2004-2014): Excel [43KB]
  • Per Capita Sales (2000-2014): Excel [40KB]
  • Inventories Held Inside and Outside the U.S. (2004-2014): Excel [41KB]
  • U.S. Retail Trade Sales - Total and E-commerce (1998-2014): Excel [43KB]
  • U.S. Electronic Shopping and Mail-Order Houses (NAICS 4541) - Total and E-commerce Sales by Merchandise Line (1999-2014): Excel [43KB]

The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks.


We also provide the page graph in the format expected by the WebGraph Framework developed by Sebastiano Vigna. The graph is represented using three files: .graph, .offsets, .properties. All three are necessary to load the network into the library.

Using the WebGraph Framework, which can be downloaded from Maven Central, these files can be loaded using the following line of code: BVGraph graph = BVGraph.loadMapped(baseName, new ProgressLogger()).

The extracted data is provided according the same terms of use, disclaimer of warranties and limitation of liabilities that apply to the Common Crawl corpus.

The Web Data Commons extraction framework can be used under the terms of the Apache Software License.


The files found here are regularly-updated, complete copies of the database, and those published before the 12 September 2012 are distributed under a Creative Commons Attribution-ShareAlike 2.0 license, those published after are Open Data Commons Open Database License 1.0 licensed.


This task evaluates performance of the sound event detection systems in multisource conditions similar to our everyday life, where the sound sources are rarely heard in isolation. Contrary to task 2, there is no control over the number of overlapping sound events at each time, not in the training nor in the testing audio data.

Last Updated On: 
Tue, 01/10/2017 - 15:56
Citation Author(s): 
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen

As part of the Obama Administration’s efforts to make our healthcare system more transparent, affordable, and accountable, the Centers for Medicare & Medicaid Services (CMS) has prepared a public data set, the Medicare Provider Utilization and Payment Data: Physician and Other Supplier Public Use File (Physician and Other Supplier PUF), with information on services and procedures provided to Medicare beneficiaries by physicians and other healthcare professionals.  The Physician and Other Supplier PUF contains information on utilization, payment (allowed amount and Medicare payment), and


The TMC maintains a map of traffic speed detectors throughout the City. The speed detector themselves belong to various city and state agencies. The Traffic Speeds Map is available on the DOT's website. This data feed contains 'real-time' traffic information from locations where DOT picks up sensor feeds within the five boroughs, mostly on major arterials and highways. DOT uses this information for emergency response and management.

The metadata defines the fields available in this data feed and explains more about the data.