New research reveals RL-finetuned VLMs still struggle with robustness and hallucination. Studies show controlled textual perturbations significantly degrade performance and confidence, particularly when chain-of-thought consistency is compromised. This highlights a critical area for improving VLM reliability in complex reasoning tasks.
Opening Kapyn…