Forget Billions: Simple Data Builds Elite Search AI
Based on research by Yuwen Du, Rui Ye, Shuo Tang, Keduan Huang, Xinyu Zhu
Frontier AI agents are usually the exclusive domain of tech giants with bottomless budgets. But a new open-source model proves that you don't need billions in compute to build world-class search capabilities. You just need the right data.
Researchers have developed OpenSeeker-v2, a search agent that challenges the industry standard. Typically, building such powerful tools requires a massive, resource-heavy pipeline involving pre-training, continual pre-training, supervised fine-tuning, and reinforcement learning. This new approach skips the complexity. By focusing on high-difficulty training trajectories and strict data filtering, the team achieved state-of-the-art results using only supervised fine-tuning on a modest dataset of 10,600 examples.
The results are striking. OpenSeeker-v2 outperforms Tongyi DeepResearch, a model trained with far more resources, across four major benchmarks. It scored 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench. The key was not more compute, but smarter data synthesis. The team scaled knowledge graphs for richer exploration, expanded tool sets for broader functionality, and applied strict low-step filtering to ensure efficiency.
This breakthrough democratizes access to frontier search technology. For the first time, an academic team has created a top-tier search agent within its scale and paradigm using only supervised fine-tuning. By open-sourcing the model weights and findings, the researchers are proving that simple, well-curated data can rival industrial might, making advanced AI research accessible to everyone.