New research suggests that the hype surrounding artificial intelligence in biology might be dangerously overstated. While AI models currently dominate benchmarks, they fail miserably when faced with data that differs from their training sets, potentially leaving medical and scientific applications vulnerable to sudden errors.

Scientists have developed a rigorous new framework called CHANRG to test how well RNA prediction tools handle unseen challenges. This system analyzed over 10 million sequences to create a dataset of 170,083 unique structures designed to prevent models from cheating or relying on easy patterns. The results were stark: although deep learning and foundation model approaches scored highest on standard tests, they lost nearly all their advantage when tested out of distribution. In contrast, simpler structured decoders proved far more reliable under these strict conditions. The conflict lies in the fact that current evaluations may have inflated our trust in AI capabilities by not testing generalization properly. The study confirms that without structural awareness and proper split designs, advanced neural networks struggle to predict higher-order connections correctly. Ultimately, researchers must adopt stricter evaluation standards involving symmetry-aware checks before deploying AI tools for critical RNA therapeutic design.

Title: Fair splits flip the leaderboard: CHANRG reveals limited generalization in RNA secondary-structure prediction Authors: Zhiyuan Chen, Zhenfeng Deng, Pan Deng, Yue Liao, Xiu Su Link: https://arxiv.org/abs/2603.22330