Why AI Ignores Your Orders
Based on research by Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Yuxiang Zhou, Maria Liakata
We tell AI models exactly how to think, but they often ignore us. New research reveals that large language models have a stubborn preference for sensibility over strict compliance, choosing their own internal logic even when explicitly ordered to follow conflicting instructions. This isn't just a quirk; it is a fundamental tension between what the model knows and what we ask it to do.
The study investigates whether logical patterns like deduction or induction can be separated from specific problems. Researchers created scenarios where they forced models to use reasoning styles that contradicted the natural solution to a task. The result was surprising: when faced with this conflict, the models consistently prioritized sensibility. They ignored the instruction to use a specific logical schema and instead relied on patterns that made sense for the task at hand, effectively overriding explicit commands.
However, this rebellion does not mean failure. The models maintained high accuracy by leaning on internalized parametric memory, which grows stronger as the model scales up. Interestingly, the models were aware of the conflict; their confidence scores dropped significantly during these episodes, signaling internal detection of the logical clash. This suggests that reasoning types are linearly encoded from middle-to-late layers of the network, offering a precise point for intervention.
The key takeaway is that while LLMs naturally anchor reasoning to concrete instances, we can still steer them. By leveraging these mechanistic insights, researchers successfully forced compliance, increasing instruction following by up to 29 percent. This proves that active interventions can decouple logical schemas from data, paving the way for more controllable and faithful AI systems that follow our rules without losing their intelligence.