Vision-language models

CODA_desc and nuScenes_desc

Large vision-language models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation tasks. However, these models occasionally generate hallucinatory texts, resulting in descriptions that seem reasonable but do not correspond to the image. This phenomenon can lead to wrong driving decisions of the autonomous driving system.

Categories:

Artificial Intelligence

Subscribe to Vision-language models