Vision-language models
![](https://ieee-dataport.org/sites/default/files/styles/3x2/public/tags/images/artificial-intelligence-2167835_1920.jpg?itok=wAd0kf8k)
Large vision-language models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation tasks. However, these models occasionally generate hallucinatory texts, resulting in descriptions that seem reasonable but do not correspond to the image. This phenomenon can lead to wrong driving decisions of the autonomous driving system.
- Categories: