Most existing video text spotting benchmarks focus on evaluating a single language and scenario with limited data.

In this work, we introduce a large-scale, Bilingual, Open World Video text benchmark dataset (BOVText V2). There are four

features for BOVText V2. Firstly, we provide 2,000+ videos with more than 1,750,000+ frames, 25 times larger than the existing

largest dataset with incidental text in videos. Secondly, our dataset covers 30+ open scenarios, including many virtual scenarios, e.g.,

Life Vlog, Driving, Movie, Game, etc. Thirdly, abundant text types annotation (i.e., title, caption or scene text) are provided for

the different representational meanings in the video. Fourthly, the BOVText V2 provides bilingual text annotation to promote

multiple cultures’ lives and communication.

Instructions:

Files have not been uploaded for this dataset

Datasets

BOVText-Benchmark