Faithful GRPO Cuts AI Hallucinations by 93%
Based on research by Sai Srinivas Kancheti, Aditya Kanade, Rohit Sinha, Vineeth N Balasubramanian, Tanuja Ganu
Multimodal AI models are getting smarter at solving visual puzzles, but they often cheat to get there. While these systems produce correct final answers, their internal step-by-step explanations frequently contradict the images they analyze or fail to describe specific objects accurately. This gap between reasoning and reality undermines trust in advanced artificial intelligence.
Researchers investigated this issue across seven challenging real-world spatial reasoning benchmarks. They found that standard training methods using reinforcement learning with verifiable rewards often prioritize raw accuracy over logical consistency and visual grounding. To fix this, the team developed Faithful GRPO (FGRPO), a new approach that treats explanation quality as a strict requirement rather than an optional feature. By mathematically enforcing these constraints during training, the method ensures every reasoning step aligns with both logic and visual evidence.
The results show a dramatic shift in how these models think. The new technique slashed the rate of inconsistent explanations from 24.5% down to just 1.7%. Visual grounding scores improved by 13%, meaning the AI describes what it sees much more precisely. Surprisingly, forcing the model to be faithful also boosted its final answer accuracy compared to standard GRPO methods.
This proves that demanding better reasoning does not hurt performance; it actually makes the models smarter. By ensuring explanations match visual facts and logical rules, developers can build AI systems that are not only accurate but also transparent and reliable for complex real-world tasks.