web data

This dataset is for researching main content extraction from web pages as a archived mongoDB file and postgresql dump file.

This dataset has crawled MHTML files of web pages from nine languages (Korean, Japanese, Indonesian, French, Russian, Saudi Arabian (Arabic), and Chinese).

Releated Resources:

Categories:
178 Views

Free dataset from news/message boards/blogs about CoronaVirus (4 month of data - 5.2M posts). The time frame of the data is Dec/2019 - March/2020. The posts are in English mentioning at least one of the following: "Covid" OR CoronaVirus OR "Corona Virus".

 

Categories:
3862 Views