The limitations of current web-based artificial intelligence agents stem from their inability to process extensive information during complex searches, hindering performance on tasks requiring multiple steps and diverse knowledge. To address this, Xixi Wu, Kuan Li, and Yida Zhao, along with colleagues at their institutions, introduce ReSum, a new approach that allows agents to overcome these constraints through periodic summarisation of search histories. This innovative paradigm converts lengthy interaction records into concise reasoning states, effectively maintaining awareness of previous findings without being limited by context window sizes. The team demonstrates that ReSum consistently improves performance over existing methods, achieving significant gains on challenging benchmarks and, with further training using their ReSum-GRPO technique, surpassing the capabilities of current open-source web agents even with limited training data.
AI Demonstrates Complex Reasoning and Search
The system effectively understands complex questions, strategically searches for information, synthesizes findings from multiple sources, and logically connects different pieces of evidence. It consistently verifies information and provides clear, concise summaries of its reasoning. In tests involving film scripts and actresses, the AI successfully identified connections between films, scripts, and individuals, accurately tracing biographical details and professional roles, such as linking a film to its script and identifying teaching positions in Chongqing, and correctly determining an actress’s birthplace. A key strength lies in the AI’s ability to refine search terms based on previous results, ensuring a focused and efficient search process. It handles questions requiring multiple reasoning steps and maintains contextual understanding throughout the conversation. While the system performs well, minor improvements could involve reducing redundancy in search queries and expressing a confidence level in its answers, making it a valuable tool for research and problem-solving.
ReSum Framework Compresses Long Interaction Histories
Scientists have addressed limitations in long-context language models for web-based tasks, specifically maintaining relevant information during extended interactions. They developed ReSum, a framework that periodically compresses lengthy interaction histories into concise reasoning states, enabling indefinite exploration within a fixed context window. The process involves an agent beginning with a user query, alternating between reasoning and tool use, searching the web, and building a history of interactions. When a compression trigger activates, a dedicated summary tool distills this history into a goal-oriented summary, highlighting verified evidence and identifying remaining information gaps.
This summary is combined with the original query to create a new, compressed state, resetting the working history while preserving essential knowledge. The team implemented this using Jina for web browsing and Qwen2. 5-72B-Instruct for accurate information retrieval. Rigorous testing revealed that failures often resulted from trajectories being prematurely truncated due to context limits. Further refinement involved the development of ReSumTool-30B, a specialized summary tool built upon a powerful open-source model and fine-tuned using the SailorFog-QA benchmark, a challenging dataset requiring agents to actively utilize summary tools.
This tool was trained using data collected from ReSum rollouts, distilling specialized summarization capabilities into Qwen3-30B-A3B-Thinking1. To further enhance performance, the team introduced ReSum-GRPO, a training framework that familiarizes agents with this summary-conditioned reasoning process, enabling more effective context management and improved problem-solving capabilities. This innovative approach allows agents to maintain awareness of prior discoveries while bypassing context constraints, ultimately achieving sustained exploration and improved performance on complex web-based tasks.
ReSum Enables Indefinite Web Navigation for AI
Scientists have developed ReSum, a new approach to web navigation for artificial intelligence agents, that overcomes limitations imposed by context windows in existing systems. Current agents struggle with complex online tasks requiring extensive searching because they quickly exhaust their ability to process information within a limited context. The ReSum paradigm addresses this by periodically summarizing the agent’s interaction history, converting it into a compact reasoning state that allows for indefinite exploration without hitting context limits. Experiments demonstrate that ReSum delivers an average absolute improvement of 4.
The research team discovered that failed attempts at solving these tasks typically involve exceeding token limits and requiring more than 10 tool calls, while successful attempts usually complete within 10 calls. To facilitate this, they developed a specialized summary tool, ReSumTool-30B, built on a Qwen3-30B-A3B-Thinking model, which distills key evidence and proposes actionable next steps. This tool was trained using data from the SailorFog-QA benchmark, enabling it to effectively summarize lengthy and noisy interaction histories. The ReSum-GRPO training method leverages reinforcement learning to adapt agents to this new summary-conditioned reasoning paradigm, allowing them to retain existing skills while mastering the new approach.
Unbounded Web Agent Exploration via Summarisation
This work addresses the limitations of context windows in large language model-based web agents, which hinder their ability to perform complex, multi-step searches. Researchers developed ReSum, a new approach to inference that periodically compresses lengthy interaction histories into concise summaries, allowing agents to continue exploring without being constrained by context length, enabling unbounded exploration for knowledge-intensive tasks. The team further refined this paradigm with ReSum-GRPO, a training algorithm that adapts agents to effectively utilize summary-conditioned reasoning. Extensive experiments across multiple benchmarks demonstrate that ReSum consistently improves performance over existing methods, with an average absolute improvement of 4.
5% over the ReAct framework. Notably, a ReSum-GRPO-trained agent, WebResummer-30B, achieved strong results on challenging web browsing tasks, surpassing the performance of other open-source agents with limited training data. The authors acknowledge that the current system relies on rule-based invocation of summaries and future work will focus on enabling agents to intelligently determine when to create these summaries, removing this reliance, representing a significant step towards building more capable and efficient web agents that can tackle complex information-seeking tasks.