kapynResearch

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

New research reveals RL-finetuned VLMs still struggle with robustness and hallucination. Studies show controlled textual perturbations significantly degrade performance and confidence, particularly when chain-of-thought consistency is compromised. This highlights a critical area for improving VLM reliability in complex reasoning tasks.

Apple ML Research·Jul 2, 2026

Opening Kapyn…