← Learnings
AI

Async context compactor: Token compaction, the why and how

6/14/2026
Architecture

The Problem

When we started building long-running research agents, we expected context growth to come from conversations. That wasn't what happened. Most of the context was generated by the agent itself.

A typical research workflow looked something like this:

On the surface this appears relatively simple. However, each stage produces an enormous amount of intermediate information. Downloading data creates metadata and dataset references. Analysis produces temporary calculations. Tool executions generate raw outputs. Search creates pages of results that may only be useful for a few minutes. By the time a workflow completed, the context window contained far more machine-generated information than human conversation. In many cases the agent was carrying around tens of thousands of tokens that were no longer relevant to the task at hand.


Context Is A Terrible Storage Layer

Our first instinct was the same one most agent systems arrive at :

  • Keep more context.
  • Store more memory.
  • Increase retrieval.

Unfortunately this only delays the problem. Context is optimized for reasoning, not storage. A model should spend its attention budget thinking about the current problem, not re-reading API responses, datasets, execution traces, and search results it generated three hours ago. The longer an agent runs, the more context starts resembling a filesystem.


The Insight

We realized that most generated information falls into one of two categories.

The first category contains information that is actively useful for reasoning:

  • Current goals
  • User conversation
  • Recent decisions
  • Active workflow state

The second category contains information that should exist somewhere, but not necessarily inside the context window:

  • Search results
  • Tool outputs
  • Code files
  • Datasets
  • Metadata
  • Reports
  • Execution traces

The mistake was treating both categories the same.


Async Context Compactor

The solution eventually became the Async Context Compactor.

Instead of waiting until the model was forced to summarize its entire history, we continuously monitored context growth. Once a configurable threshold was reached, a background compaction task was triggered. The important detail is that the agent never stops working. Compaction happens asynchronously.

The goal is not summarization. The goal is separation. We want to identify information that no longer belongs in context and move it into systems designed to store it.


Where Everything Goes

Large payloads are moved into Artifacts.

Artifacts contain the actual files generated during execution, including datasets, code, reports, search results, and tool outputs.

Structured findings are stored inside FactSets. These represent conclusions rather than raw observations. Instead of storing every page of research, we store the findings that emerged from it.

Reusable workflows become ActionPads. If an agent successfully discovers a sequence of steps for solving a problem, future agents should be able to reuse that execution rather than paying the planning cost again.

Over time, execution becomes experience.


What Actually Remains

After compaction, context becomes dramatically smaller. More importantly, it becomes focused. The context window primarily contains:

  • User conversation
  • Current objectives
  • Recent decisions
  • Active workflow state

Everything else remains available through retrieval. The information is not deleted. It is simply moved to a location better suited for storing it.

This distinction is subtle but important. The objective is not to compress information. The objective is to stop treating context as the place where all information must live.


Results

In practice, most long-running workflows experienced a context reduction between 55% and 75%. The biggest gains came from removing large generated payloads such as search results, datasets, code files, and tool outputs.

100K+ Tokens
      ↓
25K-45K Tokens

The quality improvement was often more important than the token savings. The model spent less time navigating irrelevant information and more time reasoning about the task in front of it.