We propose MM-Vet v2, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing abilities, such as solving math problems written on the blackboard, reasoning about events and celebrities in news images, and explaining visual jokes. Rapid model advancements pose challenges to evaluation benchmark development.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

[1] Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, Jianfeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang Wang, Xinchao Wang, "MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities", IEEE Dataport, 2024. [Online]. Available: http://dx.doi.org/10.21227/pvmd-s489. Accessed: Mar. 16, 2025.
@data{pvmd-s489-24,
doi = {10.21227/pvmd-s489},
url = {http://dx.doi.org/10.21227/pvmd-s489},
author = {Weihao Yu; Zhengyuan Yang; Lingfeng Ren; Linjie Li; Jianfeng Wang; Kevin Lin; Chung-Ching Lin; Zicheng Liu; Lijuan Wang Wang; Xinchao Wang },
publisher = {IEEE Dataport},
title = {MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities},
year = {2024} }
TY - DATA
T1 - MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
AU - Weihao Yu; Zhengyuan Yang; Lingfeng Ren; Linjie Li; Jianfeng Wang; Kevin Lin; Chung-Ching Lin; Zicheng Liu; Lijuan Wang Wang; Xinchao Wang
PY - 2024
PB - IEEE Dataport
UR - 10.21227/pvmd-s489
ER -
Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, Jianfeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang Wang, Xinchao Wang. (2024). MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities. IEEE Dataport. http://dx.doi.org/10.21227/pvmd-s489
Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, Jianfeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang Wang, Xinchao Wang, 2024. MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities. Available at: http://dx.doi.org/10.21227/pvmd-s489.
Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, Jianfeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang Wang, Xinchao Wang. (2024). "MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities." Web.
1. Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, Jianfeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang Wang, Xinchao Wang. MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities [Internet]. IEEE Dataport; 2024. Available from : http://dx.doi.org/10.21227/pvmd-s489
Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, Jianfeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang Wang, Xinchao Wang. "MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities." doi: 10.21227/pvmd-s489