the 5 axioms of context

I spent the past month at Juma improving how we handle context to fix hallucinations and output quality issues reported by our users.

There’s a lot of context engineering material out there but it's very technical without getting into the “why are LLMs like that”. So I went back to first principles and distilled context engineering into a few axioms we used when implementing the fixes.

Below is a collection of these axioms and some practical tips that can help you 1) have better conversations with AI 2) build effective agents.

1. Context is limited

There is an effective maximum context beyond which additional information degrade output quality.

The model's attention has a fixed budget of focus. Every piece of information you add competes for a share of that budget. When you double the information, you add more context but you also force the model to spread its attention accross everything.

This creates diminishing returns: early context additions are valuable, but each additional piece contributes less while diluting focus on what came before. Past a certain threshold, more context actually produces worse output because the signal gets lost in the middle.

Output quality vs context window

Even though model providers advertise their maximum context windows - Opus 4.5 at 200k tokens and GPT-5.2 at 400k - performance starts degrading well before you hit those limits.

The exact threshold varies by model and task. As context length increases, the performance degrades between 14% to 85% even when models could perfectly retrieve all relevant information.

My rule of thumb is to work on one task per chat. If the chat gets long, I ask the AI to summarize it, then start over in a new chat. In Claude Code you can use /compact when you approach around 60% of the maximum context window.

When you don't like an output, go back and edit your original message instead of sending a new message with more instructions. In this way you're saving one turn of output, reasoning, metadata tokens, and keeping the context small.

If you're building agents, be critical of everything you put in context, especially the tool instructions, calls, and results. Make sure you have the minimum number of high signal tokens to help the AI perform the task at hand.

The practical learning here is - keep your chats short, one task per chat, do not "dump everything in", remember that optimal context is reached way before the maximum context window.

2. Every token affects the output

All information in context participates in output generation.

Everything you include in the conversation actively shapes what the AI generates next.

Models work through attention mechanisms where every token gets weighted against every other token when generating output. Think of it like a giant probability calculation where each piece of information in your context gets a vote on what comes next. Everything participates, including your typos.

This creates some non-obvious problems. When you give an AI multiple tasks with conflicting instructions, they're all actively competing for attention with each other. The end result is an inferior output on all tasks because the AI spreads thin accross them.

In agentic systems, when when you give an AI access to twenty different tools, it doesn't just "know they exist" - those tool definitions are actively competing with each other. We've seen models call completely irrelevant tools just because they were listed, even when the correct answer was "use no tools at all." Here, the presence of options creates interference.

Also, contrary to popular belief, telling the model to "act as a developer" has no statistically significant effect in performance. When there's an effect in performance, it's often negative. The issue is that these role-playing tokens don't unlock "hidden capabilities" and just shift the model's probability distribution in unpredictable ways. The same goes for verbose prompts copied from the web, with instructions like "be creative," "think step by step," or "be concise but thorough."

My rule is to treat every word like it costs something. Before adding anything to context, I ask "is this directly useful for the specific task at hand?".

When we removed redundant tool definitions and streamlined our system prompts, we saw fewer hallucinations and more focused responses.

If you want something to have zero influence on the output, don't put it in the context at all. This means being pedantic in scrutinizing what you include.

3. Nothing persists between calls

AI retains nothing between turns. Everything must be explicitly encoded in context.

The biggest gotcha for me was learning that LLMs are completely stateless. Between chat turns or API calls, nothing persists - not conversation history, not learned preferences, not accumulated understanding.

Every message you send to an AI model is as if the model is seeing it for the first time. What the model actually sees is something like: "based on this conversation history - how would you respond?".

To combat this stateless feeling, ChatGPT's memory system uses explicit memory layers like session metadata, stored user facts, conversation summaries, current messages which are all injected into every request. The user is left with the feeling that "the model remembers" but in reality it's the developer who remembers to re-inject memory context into the chat.

Great context is an engineering challenge, not a model capability. If you want the AI to "remember" something, explicitly persist it and re-inject it into context at the right time. This is where most of the competition exist nowadays for good chat UX.

4. Position affects attention

Information's influence varies by position within context.

Information placed in the middle of long contexts is less likely to be retrieved than information at the beginning or end. This is the "lost in the middle" phenomenon.

Manus exploits this deliberately. Their agent constantly rewrites a todo.md file, pushing task objectives to the end of context where attention is strongest. They call it "manipulating attention through recitation."

Position accuracy tests have confirmed this by showing that unique words placed near the beginning of sequences are identified correctly more often than those in the middle.

In a real-world case, be mindful of where you're pasting large chunks of text. I structure my prompts by placing core instructions either at the start or end, with additional context in between.

Apart from position, what's also important when building agentic systems is the role of the prompt. Models treat instructions of type role: system with higher priority than of role: user. Always make sure your system prompt has a proper role set and is placed at the beginning of the conversation sequence.

5. Context is not self-correcting

All information in context is equally valid premises unless externally verified.

One of the worst assumptions you can make is that the model would "know" which parts of context are correct.

From the model’s perspective, the context is a set of premises. Everything inside it is treated as equally valid input for reasoning, regardless of whether it came from a trusted document, a user typo, or a hallucination three turns ago.

This is why context poisoning is so dangerous. Once an incorrect assumption enters the context, the model does not treat it as suspicious or provisional. It will produce logically consistent, well-structured answers built entirely on false premises.

Summarization and tool outputs make this worse. When you summarize a conversation and inject it back into context, any errors in that summary become the new ground truth. The model can't "remember" that this information was lossy or uncertain—from its perspective, the summary is just as trustworthy as the original, often more so because it's newer. The same applies to agentic systems that persist tool outputs, failed attempts, or scratchpad notes. If a tool returns misleading data and you persist in context, the agent will happily build future decisions on top of it. Manus explicitly calls this out in their agent design: context must be curated, not trusted by default.

When chatting, if the AI makes a mistake, correct it explicitly and don't assume it will self-correct. The model treats its own previous outputs as valid premises unless you tell it otherwise. If you notice the conversation going off track, either correct it directly: "you're wrong — here's what I meant" or start a new chat. Don't let errors accumulate.

If you're building agents, try to actually persist errors and failures in context rather than cleaning them up. When an agent sees its own failed actions and stack traces, it implicitly updates its internal beliefs and becomes less likely to repeat the same mistake again. However, there's a critical distinction: persist the evidence of failure (the action that didn't work, the error message, the incorrect output) but don't persist incorrect information stated as fact. The key is distinguishing between "I tried this and it failed" (valuable learning signal) versus "this incorrect thing is true" (context poisoning).