110K Sensitive Video Dataset

Citation Author(s):
Pedro Vinicius
Almeida de Freitas
PUC-Rio
Gabriel
Noronha Pereira dos Santos
PUC-Rio
Antonio
José Grandson Busson
PUC-Rio
Alan
Livio Vasconcelos Guedes
PUC-Rio
Sérgio
Colcher
PUC-Rio
Submitted by:
Pedro Almeida d...
Last updated:
Wed, 02/02/2022 - 22:00
DOI:
10.21227/sx01-1p81
Data Format:
Links:
License:
5
1 rating - Please login to submit your rating.

Abstract 

ATTENTION: THIS DATASET DOES NOT HOST ANY SOURCE VIDEOS. WE  PROVIDE ONLY HIDDEN FEATURES GENERATED BY PRE-TRAINED DEEP MODELS AS DATA

Massive amounts of video are uploaded on video-hosting platforms every minute. This volume of data presents a challenge in controlling the type of content uploaded to these video hosting services. Those platforms are responsible for any sensitive media uploaded by their users. In this context, we propose the 110K Sensitive Video Dataset for binary video classification (whether there is sensitive content in the video or not), containing more than 110 thousand tagged videos. Additionally, we separated an exclusive subset with 11 thousand videos for testing in Kaggle.

To compose the sensitive video subset, we collected videos with content of sex, violence, and gore from various internet sources. While composing the subset of safe videos, we collect videos from everyday life, online courses, tutorials, sports, etc. It is worth mentioning that we were concerned about creating more challenging examples for each class. We collected sex videos with people wearing full-body clothes (e.g., latex and cosplay) for the sensitive video class. Moreover, we have collected videos that could be misclassified as sensitive for the safe videos class, such as MMA, breastfeeding, pool party, beach, and other videos with a higher amount of skin exposure.

This dataset comprises 53,683 safe videos and 53,683 videos with sensitive content. Those sensitive videos are 51,563 Pornographic Videos and 2120 Gore Videos. Additionally, each video class contains a list of related tags.

Comments

!!

Submitted by Aizaz Usman on Fri, 08/30/2024 - 04:05