What is context engineering?

Context engineering is the discipline of designing and managing the complete information environment that an AI model receives. Unlike prompt engineering which focuses on crafting good questions, context engineering builds systematic information pipelines that include project specs, tool configurations, conversation history, and dynamic retrieval — treating context as an engineering problem rather than an art.

How is context engineering different from prompt engineering?

Prompt engineering is about writing better individual messages. Context engineering is about building information systems. A good prompt is additive — it improves one interaction. Good context engineering is multiplicative — it improves every interaction across your entire workflow. Think of it as the difference between writing a good email versus designing an email system.

What are the main context failure modes?

There are four primary failure modes: Context Poisoning (wrong information that corrupts outputs), Context Distraction (too much irrelevant information), Context Confusion (contradictory or ambiguous information), and Context Conflict (conflicting instructions from different context sources). Each requires different mitigation strategies.

What tools help with context engineering?

Key tools include CLAUDE.md files for project-level context, MCP (Model Context Protocol) servers for tool integration, Promptfoo for CI/CD testing of context configurations, and conversation management practices like starting fresh sessions after 30 turns or 50K tokens.

Mar 10, 2026

Context Engineering: The Most Underrated AI Development Skill in 2026

Q: How many tokens should I put in my context?

Research shows LLM reasoning quality degrades significantly around 3,000 tokens of context. The sweet spot for instructions and specifications is typically 150-300 words. More context is not always better — curated, relevant context dramatically outperforms large context dumps.

Master context engineering to build better AI systems. Learn the five dimensions, four failure modes, and practical frameworks that separate hobbyist prompting from production-grade AI development.

Bruce

Context EngineeringAI DevelopmentPrompt EngineeringClaude Code

AI Guides

3392 Words

2026-03-10 02:00 +0000

Context engineering guide covering the five dimensions, failure modes, and practical frameworks for AI development

Everyone talks about prompt engineering. Courses sell for hundreds of dollars teaching you to write better prompts. But in 2026, the developers building the most impressive AI-powered systems are not spending their time crafting clever prompts — they are engineering context.

This is the skill that separates someone who occasionally gets good results from ChatGPT from someone who builds reliable, production-grade AI workflows that work consistently across thousands of interactions.

Context engineering is not a buzzword. It is a fundamental shift in how we think about working with large language models. And if you are not investing in it, you are leaving massive productivity gains on the table.

Why Context Engineering Matters More Than Prompt Engineering

Here is the uncomfortable truth: prompt engineering has a ceiling. You can spend hours perfecting a single prompt, and yes, you will get better results for that one interaction. But what happens when you need consistent results across hundreds of different tasks? What happens when your team of five developers all need the AI to behave the same way?

This is where context engineering enters the picture.

Prompt engineering is additive — each good prompt improves one interaction. Context engineering is multiplicative — a well-designed context system improves every interaction across your entire workflow.

Think about it this way:

Approach	Scope	Impact	Effort
Prompt Engineering	Single interaction	Linear improvement	Per-task
Context Engineering	Entire workflow	Exponential improvement	One-time setup, ongoing maintenance

The shift from prompt engineering to context engineering mirrors a familiar pattern in software development. Writing a quick script to solve one problem is prompt engineering. Building a reusable library with tests, documentation, and CI/CD is context engineering.

Anthropic recognized this shift early, publishing their official context engineering guide that frames context management as a core engineering discipline. LangChain followed with their own framework identifying four core strategies: write, select, compress, and isolate.

The industry consensus in 2026 is clear: casual prompting is something anyone can do; production context engineering is a genuine engineering skill.

The Five Dimensions of Context Engineering

Context engineering is not a single technique. It operates across five distinct dimensions, each requiring different strategies and tools.

1. Information Selection

The most critical dimension. Not all information is equally valuable, and more context is not always better.

Research consistently shows that LLM reasoning quality degrades significantly around 3,000 tokens of context. The sweet spot for instructions and specifications is typically 150-300 words. This means your job is not to include everything — it is to include exactly what matters.

Bad context selection:

Here is our entire codebase documentation (15,000 words),
all our API specs, the complete git history, and every
Slack conversation about this feature. Now fix the bug.

Good context selection:

Bug: User authentication fails after password reset.
Relevant code: auth/password_reset.py (lines 45-62)
Expected behavior: Token refresh after password change.
Current behavior: Old token persists, causing 401 errors.
Related test: test_password_reset_flow (currently failing).

The second example is shorter but contains higher-density relevant information. The model can reason about it effectively without drowning in noise.

2. Information Organization

How you structure context matters as much as what you include. Research has produced a surprising finding: structured context can sometimes perform worse than shuffled context when the structure creates false hierarchies or buries critical information in expected but unread positions.

The key principles:

Lead with the most important information. Models pay more attention to the beginning and end of context windows.
Use clear section headers. Not for aesthetics, but for retrieval — models use headers as semantic anchors.
Separate instructions from data. Mixing “what to do” with “what to work with” creates confusion.
Group related information. Keep API specs near API-related instructions, keep test expectations near test code.

3. Information Quality

This is where most people fail. They focus on quantity (“give the model more information”) when they should focus on quality (“give the model better information”).

Quality dimensions include:

Accuracy: Is the information current and correct? Outdated API docs are worse than no docs.
Specificity: “Write good code” is low quality. “Follow the existing error handling pattern in src/errors.py using custom exception classes” is high quality.
Consistency: Do your context sources agree with each other? Contradictions are a primary failure mode (more on this below).

4. Timing

When information enters the context matters. There are three timing strategies:

Static context is loaded at the beginning of every interaction. This includes project specs, coding standards, and tool configurations. In Claude Code, your CLAUDE.md file serves this role.

Dynamic context is retrieved during the interaction based on what the model needs. MCP (Model Context Protocol) servers are the primary mechanism for this — they let the model pull in database schemas, API responses, or documentation on demand.

Conversational context accumulates during the interaction itself. This is the most fragile dimension because it grows uncontrollably and can quickly exceed useful limits.

5. Tool Configuration

Tools are context. Every tool you give an AI model adds to its context window and shapes its behavior. This is why Anthropic’s tool design principles (covered in detail below) emphasize curation over inclusion.

Giving a model 50 tools when it only needs 3 is not just wasteful — it actively degrades performance by forcing the model to spend reasoning capacity on tool selection rather than problem-solving.

The Four Context Failure Modes

Understanding how context fails is just as important as understanding how to build it. There are four primary failure modes, each with different symptoms and solutions.

1. Context Poisoning

What it is: Incorrect, outdated, or misleading information in the context that corrupts the model’s outputs.

Example: Your CLAUDE.md file says “We use PostgreSQL 14” but you migrated to PostgreSQL 16 three months ago. The model generates migration scripts targeting PG14 features and syntax.

Symptoms: Outputs that are confidently wrong. The model produces plausible-looking code or answers that contain subtle errors traceable to bad context.

Solution: Regular context audits. Treat your context sources like code — they need reviews, updates, and version control. Schedule weekly reviews of your CLAUDE.md and other static context files.

2. Context Distraction

What it is: Too much irrelevant information that dilutes the model’s attention and reasoning capacity.

Example: You paste an entire 500-line file when the model only needs to understand a 20-line function. The model spends its reasoning capacity parsing irrelevant code and may even modify things it should not touch.

Symptoms: Outputs that are correct but unfocused. The model addresses tangential issues, over-engineers solutions, or includes unnecessary changes.

Solution: Aggressive curation. Follow the principle: include the minimum context needed for the task, not the maximum context available. Use targeted file references, line ranges, and summaries instead of full documents.

3. Context Confusion

What it is: Ambiguous or unclear information that the model interprets differently than you intended.

Example: Your spec says “the system should handle errors gracefully.” Does that mean retry? Log and continue? Throw to the user? Return a default value? The model will guess, and it might guess wrong.

Symptoms: Outputs that are reasonable but not what you wanted. The model makes plausible interpretations of ambiguous instructions that diverge from your actual intent.

Solution: Be ruthlessly specific. Replace vague instructions with concrete examples. Instead of “handle errors gracefully,” say “catch DatabaseError exceptions, log them with structured logging at ERROR level, and return a 503 status with the message ‘Service temporarily unavailable.’”

4. Context Conflict

What it is: Contradictory instructions or information from different context sources.

Example: Your CLAUDE.md says “always use functional programming patterns” but the conversation includes code snippets using class-based patterns. The MCP server returns API docs showing an OOP interface. The model receives three conflicting signals about the correct approach.

Symptoms: Inconsistent outputs. The model alternates between different approaches, or produces hybrid solutions that satisfy none of the conflicting requirements fully.

Solution: Establish clear priority hierarchies. In Claude Code, the priority order is: direct conversation instructions > CLAUDE.md project config > tool-provided context > model defaults. Document this hierarchy and resolve conflicts before they reach the model.

An interesting behavioral note from research: Claude tends to refuse or ask for clarification when facing uncertain or conflicting context, while GPT-series models tend to confabulate — producing confident-sounding answers that paper over the contradiction. Neither behavior is inherently better, but knowing your model’s failure mode helps you design better context.

Building Your Context System: A Practical Framework

Theory is useful, but you need a concrete system. Here is a four-layer context architecture that works for real AI development workflows.

Layer 1: Project-Level Static Context (CLAUDE.md)

Your CLAUDE.md file is the foundation. It is loaded at the beginning of every Claude Code session and shapes all subsequent interactions.

What belongs in CLAUDE.md:

# Project identity
- Project name, purpose, and current status
- Tech stack with specific versions
- Architecture overview (2-3 sentences, not a novel)

# Development standards
- Coding conventions (with examples, not just rules)
- Error handling patterns
- Testing requirements

# Current priorities
- Active sprint/milestone goals
- Known issues to avoid
- Recently completed work (to prevent re-doing)

What does NOT belong in CLAUDE.md:

Complete API documentation (use MCP for dynamic retrieval)
Historical decisions and their rationale (move to a separate ADR file)
Information that changes daily (use conversation context instead)

The critical principle: “Spec is the new source code.” Your CLAUDE.md is not a README — it is the primary artifact that drives AI behavior. Treat it with the same rigor you treat production code.

For a deep dive into CLAUDE.md best practices, see our complete CLAUDE.md guide.

Layer 2: Conversation-Level Dynamic Context

Each conversation session is a context container with a limited lifespan. Manage it deliberately.

Starting a conversation right:

Instead of diving straight into a task, front-load the essential context:

I'm working on the payment processing module.
The relevant files are:
- src/payments/processor.py (main logic)
- src/payments/validators.py (input validation)
- tests/test_payments.py (current test suite)

Current state: All tests pass except test_refund_partial.
Goal: Fix the partial refund calculation bug.
Constraint: Must maintain backward compatibility with v2 API.

This 8-line context setup prevents dozens of back-and-forth clarification messages.

Knowing when to start fresh:

Context degrades over time. As a conversation grows, earlier messages lose influence, contradictions accumulate, and the model’s reasoning quality drops.

Best practice: Start a new session after 30 conversation turns or 50K tokens, whichever comes first. This is not a failure — it is hygiene. Summarize the current state, start fresh, and front-load the summary as context for the new session.

If you are using Claude Code, you can see token usage in the status bar. When it climbs past 50K, it is time for a fresh session.

Layer 3: Tool-Provided Context (MCP)

MCP (Model Context Protocol) servers provide dynamic context that the model can pull in as needed. This is where context engineering gets powerful — instead of pre-loading everything, you give the model access to information sources and let it retrieve what it needs.

Common MCP context sources:

MCP Server	Context Provided
Database MCP	Live schema, sample data, query results
Git MCP	Commit history, diffs, branch state
Documentation MCP	API docs, internal wikis, runbooks
Monitoring MCP	Error logs, performance metrics, alerts
Context7	Library documentation on demand

The key insight: MCP servers transform context from a push model (you decide what to include upfront) to a pull model (the AI decides what it needs during execution). This dramatically reduces context distraction while maintaining access to comprehensive information.

For setting up MCP with Claude Code, see our MCP protocol guide.

Layer 4: Dynamic Retrieval and RAG

For large codebases and documentation sets, retrieval-augmented generation (RAG) provides a scalable context strategy. Instead of putting everything in the context window, you index your content and retrieve relevant chunks on demand.

This layer is particularly important for:

Codebases with 100K+ lines where you cannot include everything
Documentation sets that exceed context window limits
Historical data (past conversations, resolved issues) that is occasionally relevant

LangChain’s framework captures this well with their four strategies:

Write: Create high-quality context documents (specs, CLAUDE.md)
Select: Choose which context sources to activate for each task
Compress: Reduce context size while preserving information density
Isolate: Keep different context concerns separated to prevent conflicts

Anthropic’s Five Tool Design Principles

Anthropic published five principles for designing tools that serve as context for AI models. These principles apply broadly to any context you provide, not just MCP tools.

1. Curate Over Include-All

Do not give the model every tool you have. Select the 3-5 tools most relevant to the current task. Each additional tool adds tokens to the context and forces the model to reason about tool selection.

Bad:

// 47 tools available, most irrelevant to the current task
tools: [file_read, file_write, git_commit, git_push, git_pull,
        git_branch, git_merge, docker_build, docker_run,
        docker_stop, k8s_deploy, k8s_scale, ...]

Good:

// Only the tools needed for this specific workflow
tools: [file_read, file_write, run_tests]

2. Use Consistent Namespaces

Group related tools under consistent naming conventions. This helps the model understand tool relationships and choose appropriately.

// Good: clear namespace hierarchy
database_query
database_schema
database_migrate

// Bad: inconsistent naming
run_sql
getDBSchema
db-migrate

3. Provide Semantic Data

Tool responses should include semantic context, not just raw data. Instead of returning a bare JSON object, include type information, relationships, and human-readable descriptions.

4. Optimize Token Efficiency

Every token in tool descriptions, parameters, and responses counts against the context window. Be concise. Use abbreviations where unambiguous. Avoid redundant fields.

A tool description that says “This tool allows you to read the contents of a file from the filesystem by providing a file path” can be shortened to “Read file contents at the given path” without losing any information.

5. Treat Tool Descriptions as Performance Leverage

This is the most underrated principle. The quality of your tool descriptions directly impacts how well the model uses those tools. A vague description leads to misuse. A precise description with examples leads to correct usage.

# Weak description
name: search_code
description: "Search code in the project"

# Strong description
name: search_code
description: "Search project source files using regex patterns.
  Searches .py, .js, .ts files by default. Use glob parameter
  to filter file types. Returns max 50 results sorted by relevance.
  Example: search_code('def process_', glob='*.py')"

The strong description tells the model exactly what to expect, reducing trial-and-error and improving first-attempt accuracy.

Context Hygiene: Maintenance Best Practices

Context systems degrade without maintenance. Here are the practices that keep your context healthy.

Weekly CLAUDE.md Review

Schedule a 15-minute weekly review of your CLAUDE.md. Check for:

Outdated information: Has the tech stack changed? Are version numbers current?
Completed work: Remove references to tasks that are done. They waste tokens and can confuse the model.
Missing patterns: Have new coding conventions emerged that are not documented?
Contradictions: Does anything in the file conflict with current practices?

Think of it like code review for your AI configuration.

Session Lifecycle Management

New session per distinct task: Do not use one session for “fix the bug AND refactor the auth module AND write tests.” Each task gets a fresh context.
30-turn / 50K-token limit: Start fresh after hitting either threshold.
Session handoff summaries: When ending a session, ask the model to summarize current state, decisions made, and remaining work. Use this summary to bootstrap the next session.

Context Testing with Promptfoo

Promptfoo brings CI/CD discipline to your context configurations. You can:

Define expected outputs for given context + prompt combinations
Run automated tests when your CLAUDE.md or tool configurations change
Track performance regressions across context updates
A/B test different context strategies with real metrics

# promptfoo config example
prompts:
  - "Fix the failing test in {{file}}"
providers:
  - anthropic:claude-sonnet-4-20250514
tests:
  - vars:
      file: "test_payments.py"
    assert:
      - type: contains
        value: "assert_equal"
      - type: not-contains
        value: "skip"

This is how you move context engineering from “it feels like it works” to “we have data showing it works.”

Using Hooks for Context Automation

Claude Code hooks let you automate context management. For example:

Pre-session hooks that automatically load relevant context based on the current Git branch
Post-tool hooks that validate tool outputs before they enter the context
Notification hooks that alert you when context is growing too large

Hooks transform context management from a manual practice into an automated system.

The Context Engineering Workflow in Practice

Let me walk through a real workflow that applies these principles end-to-end.

Scenario: You are building a new API endpoint for user profile updates.

Step 1: Prepare static context (CLAUDE.md)

Ensure your CLAUDE.md includes the current API conventions, authentication patterns, and database schema conventions. This takes 5 minutes but saves hours of corrections.

Step 2: Start a focused session

Task: Add PUT /api/v2/users/{id}/profile endpoint.
Requirements:
- Accept JSON body with name, email, avatar_url
- Validate email format and uniqueness
- Return 200 with updated profile or appropriate error codes
- Follow existing patterns in src/api/users.py

Relevant files: src/api/users.py, src/models/user.py,
tests/api/test_users.py

Step 3: Let MCP provide dynamic context

As the model works, it pulls database schemas via the Database MCP, checks existing patterns via file search, and validates against your test suite.

Step 4: Monitor context health

After 20 turns, check: Is the conversation staying focused? Has scope crept? Are we accumulating contradictions? If the session has drifted, summarize and start fresh.

Step 5: Capture learnings

After completing the task, update your CLAUDE.md if any new patterns emerged. Did the model struggle with something that better static context would have prevented? Add it.

This workflow integrates naturally with vibe coding practices — the key difference is that context engineering makes vibe coding reliable and reproducible rather than dependent on luck.

Context Engineering vs. Prompt Engineering: A Direct Comparison

To crystallize the distinction, here is a side-by-side comparison:

Dimension	Prompt Engineering	Context Engineering
Focus	Individual message quality	System-level information architecture
Scope	One interaction	Entire workflow
Skill type	Writing craft	Engineering discipline
Maintenance	Per-use	Ongoing system maintenance
Scalability	Linear (each prompt is custom)	Multiplicative (context system serves all prompts)
Failure mode	Bad output from bad prompt	Systematic failures from bad context
Testing	Manual review	Automated CI/CD with Promptfoo
Team impact	Individual productivity	Team-wide consistency

This does not mean prompt engineering is irrelevant. Good prompts still matter. But context engineering is the foundation that makes good prompts effective at scale.

The analogy: prompt engineering is writing good function calls. Context engineering is designing the API those functions call against.

Advanced Pattern: The Spec-Driven Workflow

One of the most powerful context engineering patterns is the spec-driven workflow, built on the principle that “spec is the new source code.”

Instead of writing code and then documenting it, you write the spec first and let the AI implement against it:

Write a detailed spec (150-300 words, covering behavior, constraints, edge cases)
Place the spec in CLAUDE.md or reference it as a conversation context
Let the AI implement against the spec, using MCP tools for dynamic context
Test against the spec — the spec becomes both the implementation guide and the acceptance criteria

This pattern works because it gives the model dense, high-quality, unambiguous context — exactly the kind of context that produces the best results.

For teams, specs serve double duty: they are both the context for AI implementation and the documentation for human review. This eliminates the perennial problem of code and documentation drifting apart.

If you are building a serious context engineering practice, these tools integrate well:

Claude Code: The primary environment for context engineering with CLAUDE.md, MCP, and hooks
CLAUDE.md: Deep dive into project-level context configuration
MCP Protocol: Dynamic context through tool integration
Claude Code Hooks: Automated context management
Codex CLI: Alternative tool with its own context patterns (instructions.md)
Promptfoo: CI/CD for context testing
LangChain: Framework with context strategies (write, select, compress, isolate)

Conclusion

Context engineering is not a future skill — it is a right now skill. The developers who master it in 2026 will have a compounding advantage as AI tools become more capable and context-sensitive.

The core principles are straightforward:

Curate aggressively — less context, higher quality
Build systems, not messages — invest in CLAUDE.md, MCP, and automated testing
Maintain relentlessly — weekly reviews, session hygiene, context audits
Test empirically — use Promptfoo and A/B testing, not gut feeling
Think in layers — static project context, dynamic tools, conversation management, and retrieval

Start with one change: create or improve your CLAUDE.md file. Make it specific, current, and concise. Then observe how every subsequent AI interaction improves.

That is the multiplicative power of context engineering.