Is your AI hallucinating? The problem isn't your prompt; it's your context. Discover why Context Engineering is the future of reliable AI and how structured data and precise retrieval can reduce errors by up to 85%.
Let’s strip away the hype for a moment. To understand where AI development is going, we have to be clear about what a Large Language Model (LLM) actually is.
At its core, an LLM isn't a digital brain. It is a prediction engine. Think of it as super-charged autocomplete. It looks at the text you give it and calculates, based on billions of human-written pages, what word is statistically most likely to come next. It doesn't "know" the answer in the way a human does; it simply knows what a correct answer usually looks like.
You’ve almost certainly heard lots about Prompt Engineering over the last few years. If the model is just predicting the next word, maybe we can steer those predictions with the right "magic words." We tell models to "act as a senior developer," "think step-by-step," or even "take a deep breath." We try to trick the AI into seeming smart.
To me, Prompt Engineering feels less like engineering and more like casting spells. You’d change a verb, and the output would fix itself. You’d run it again tomorrow, and it would break. It can be fragile and frustrating.
As we move from chatting with bots to building reliable software, we don't need better prompts. We need better context.
This is Context Engineering.
If prompt engineering is deciding how to ask, context engineering is deciding what the model knows before you even speak. It is the art of curating the information environment—the documents, the history, the rules—so that the "statistically likely" next word is also the factually correct one.
It’s not about whispering the right command; it’s about building a room where the model can’t help but give the right answer.
The marketing departments of AI labs love to talk about context windows. 32k, 128k, 1 million tokens. The implication is obvious: Just dump your entire knowledge base into the chat. The model will figure it out.
If only it were that simple.
The data suggests that treating an LLM like a bottomless bucket is a recipe for failure. A study on "Context Rot" found that performance doesn't just plateau as you add more data—it can actively degrade. Even when models perfectly retrieve information, the sheer volume of text can hurt reasoning capabilities by 13% to 85% depending on the task.

Why? Because LLMs have what we might call an "attention budget."
There is a well-documented phenomenon known as the "Lost in the Middle" effect. When you provide a massive amount of context, models are great at recalling information from the very beginning and the very end. But the data buried in the middle? It often disappears into a black hole.
If you are building a RAG (Retrieval-Augmented Generation) system and you simply stuff the top 20 search results into the prompt, you aren't giving the model more information. You are giving it noise.
Most of us have seen this happen when chatting to an LLM over days or weeks, all of a sudden it can start to forget key details from that conversation.
Context Engineering moves us away from wordsmithing single queries and toward designing the information environment the model inhabits.
In practice, this means:

To show what this actually looks like in production, let’s imagine we are building a customer support bot for a payment platform. A user asks: "Why did my transaction tx_999 fail?"
The Prompt Engineer spends hours refining the instructions.
System: "You are a world-class payment analyst. Think step by step. Use your knowledge to explain the failure." User: "Why did tx_999 fail?"
The Result: Hallucination. The model has never seen tx_999. It either politely refuses or, worse, invents a plausible reason like "Insufficient funds" because that is statistically the most common reason for payment failures in its training data.
The developer realizes the model needs data, so they implement basic RAG (Retrieval Augmented Generation). They fetch the user's entire transaction history (JSON) and the platform's entire API documentation (PDFs), converting them to text and dumping them into the context window.
Context: 50kb of raw JSON history, +20 pages of API docs
User: "Why did tx_999 fail?"
The Result: The "Lost in the Middle" effect. The model is overwhelmed by 500 other successful transactions. The specific error code for tx_999 is buried in a nested JSON object on line 4,000. The model misses the specific error code do_not_honor and gives generic advice about checking bank balances. The latency is 5 seconds, and the API cost is high.

Here, we don't just dump data; we engineer the input.
Step A: Precise Retrieval (The "Router") Context Engineering recognizes that valid context starts with understanding intent, not just matching keywords.
If we simply searched a vector database for "Why did tx_999 fail?", we'd get generic articles about failure. Instead, we insert an engineering step called a Router or Classifier before retrieval.
transaction_id="tx_999".SELECT * FROM transactions WHERE id = 'tx_999'.Here is how we translate that concept into code using JavaScript and the Google Gemini API. You can easily run this code locally and play around to grasp the concept of first classifying the intent and then routing the request based on this intent.
const { GoogleGenerativeAI, SchemaType } = require("@google/generative-ai");
const apiKey = process.env.GEMINI_API_KEY;
if (!apiKey) {
console.error("Error: GEMINI_API_KEY environment variable is required.");
process.exit(1);
}
const genAI = new GoogleGenerativeAI(apiKey);
// 1. Define the Schema (The Context Contract)
// Gemini takes a JSON schema object directly in the generationConfig.
const userIntentSchema = {
type: SchemaType.OBJECT,
properties: {
category: {
type: SchemaType.STRING,
enum: ["transaction_debug", "policy_question", "greeting"],
},
transaction_id: {
type: SchemaType.STRING,
description: "The ID of the transaction if specifically mentioned",
nullable: true,
},
topic: {
type: SchemaType.STRING,
description: "The general topic of the policy question",
nullable: true,
},
},
required: ["category"],
};
// 2. The Router Logic
async function routeQuery(userText) {
const model = genAI.getGenerativeModel({
model: "gemini-2.5-flash-lite",
generationConfig: {
responseMimeType: "application/json", // Forces JSON output
responseSchema: userIntentSchema, // Enforces the specific structure
},
});
const prompt = `
You are a router for a customer support system.
Analyze the user's query and extract the intent according to the schema.
User Query: "${userText}"
`;
try {
const result = await model.generateContent(prompt);
const response = result.response;
if (!response) {
throw new Error("No response from API");
}
// Gemini returns the JSON string directly in the text
const text = response.text();
const intent = JSON.parse(text);
console.log("------------------------------------------------");
console.log(`INPUT: "${userText}"`);
console.log("EXTRACTED INTENT:", JSON.stringify(intent, null, 2));
// 3. Deterministic Execution
if (intent.category === "transaction_debug" && intent.transaction_id) {
console.log(
`🔧 ROUTING: Executing SQL: SELECT * FROM ledger WHERE id = '${intent.transaction_id}'`
);
return await mockFetchTransactionSql(intent.transaction_id);
} else if (intent.category === "policy_question") {
// Use the 'topic' for the vector search, or fall back to the whole text if null
const searchTerm = intent.topic || userText;
console.log(`📚 ROUTING: Executing Vector Search for '${searchTerm}'`);
return await mockVectorSearch(searchTerm);
} else {
console.log("👋 ROUTING: Handling Greeting/General");
return "Hello! How can I help you with your transactions or policies today?";
}
} catch (error) {
console.error("Error parsing intent or communicating with API:", error);
throw error;
}
}
// --- Mock Functions for Demonstration ---
async function mockFetchTransactionSql(id) {
return { id: id, status: "completed", amount: 42.5, date: "2023-10-27" };
}
async function mockVectorSearch(topic) {
return [
{
score: 0.92,
content: `Policy regarding ${topic}: refunds are processed within 5 days.`,
},
{ score: 0.85, content: "General terms of service..." },
];
}
// --- Run Examples ---
async function main() {
await routeQuery("Why was transaction TX-998877 declined?");
console.log("");
await routeQuery("What is your policy on international refunds?");
console.log("");
await routeQuery("Hello there");
}
main();
This distinction—vector search vs. deterministic fetch—is often the difference between a toy and a tool.
Step B: Data Transformation LLMs struggle with raw, nested JSON, especially when it contains 50 fields of database noise (e.g., internal_trace_id, updated_at_utc). We write a transformer function to strip the noise and format the signal into Markdown, which LLMs parse exceptionally well.
function engineerPaymentContext(transaction, errorDefinitions) {
// 1. Strip Noise: Remove internal fields the LLM doesn't need
const irrelevantKeys = ['internal_id', 'trace_hash', 'shard_key'];
// Filter the transaction object
const cleanTx = Object.fromEntries(
Object.entries(transaction).filter(([key]) => !irrelevantKeys.includes(key))
);
// 2. Format: Convert JSON to a clear Markdown Table using template literals
let context = "## Transaction Analysis\n";
context += "| Field | Value |\n|---|---|\n";
context += `| **Status** | ${cleanTx.status} |\n`;
context += `| **Amount** | $${cleanTx.amount} |\n`;
context += `| **Error Code** | \`${cleanTx.error_code}\` |\n`;
// 3. Inject Specific Knowledge: Only the relevant doc snippet
if (cleanTx.error_code && errorDefinitions[cleanTx.error_code]) {
const definition = errorDefinitions[cleanTx.error_code];
context += `\n### Error Documentation\n> **${definition.title}**: ${definition.human_explanation}\n`;
}
return context;
}
The Result: Reliable Intelligence. The model receives a concise, 20-line context block. It sees the status, the amount, and the exact definition of the error. It doesn't have to guess. It doesn't have to search through noise. It simply reads the context we engineered and summarizes it for the user.
We are seeing empirical evidence that "less is more" when engineered correctly.
Prompt engineering hasn't gone anywhere. You still need to give clear instructions to get clear results. But we need to stop expecting the prompt to do the heavy lifting for data retrieval and reasoning structure.
The reality is that reliability doesn't come from finding the perfect adjective. It comes from the architecture surrounding the model. It comes from the unglamorous work of cleaning data, defining strict schemas, and routing queries deterministically.
Context Engineering is simply the recognition that an LLM is only as good as the information you feed it. If you want to move beyond fragile demos and build software that people actually trust, you have to stop treating context as an afterthought.
Engineer the environment, not just the question.
If you found value here, I’d love to hear your thoughts in the comments. You can also subscribe below for more de-noised, hard-won lessons.