Geunseong Jung

First Name

Geunseong

Last Name

Jung

Dataset Entries from this Author

Multilingual datasets for Main content extraction from web pages

This dataset is for researching main content extraction from web pages as a archived mongoDB file and postgresql dump file.

This dataset has crawled MHTML files of web pages from nine languages (Korean, Japanese, Indonesian, French, Russian, Saudi Arabian (Arabic), and Chinese).

Releated Resources:

Categories:

Other