Priming the Context Window:
A Cognitive Approach to Prompting

05/22/25 • 8 mins

I want to introduce a cognitive science-inspired approach to prompting, that helps AI engineers guide models to produce desired outputs, not just the generic, predictable responses we grapple with to improve.

As large language models (LLMs) proliferate across software engineering, prompt engineering has quickly become a core practice. But prompting is still in its infancy, and most techniques rely on explicit instructions or examples, generate truly precise results.

But what if you could steer these models deeper, shaping not just what they do, but how they do it.

This article attempts to answer that very question - first by understanding the problem with current prompt techniques, the underling architecture, and priming, before explaining it. it

The Problem: How Prompting Falls Short

Most prompt engineering techniques rely on explicit instructions or examples. They give context and instruct the model what to do.

The core issue is LLMs don't always respond to prompts the way users expect. Current prompt methods don't address how to elicit truly precise results, and tutorials don't enable us to think about how to do it.

Instruction-based prompting:

"Act as an expert."
"Write a LinkedIn style post."
"Be creative."

It's worth noting that models, by design, don't follow instructions. These approaches work especially well because models are reinforced during post-training.

I imagine researchers didn't initially know how users would interact with the LLMs. But after seeing users mostly give instructions and expect responses, they reinforced models to respond to that pattern.

Ask it to write, and you'll get the bland hypophora hook and pseudo-reasoning. The model isn't actually reasoning; it's simulating a response it learned for those instructions. [4][5]. It's not inadequate - it learned those patterns from the raw data, and behaviour that was reinforced when following instruction.

On the other hand, users tend to use ambiguous prompt. Subjective and unspecific adjectives like, "Good" is insufficient for models to know what you mean by that.

Other prompting approaches are less about direct instruction and more about leading by example.

Few-shot prompting:

You give the model several input-output examples and then ask it to continue the pattern:

"Classify the sentiment of the following text as positive, negative, or neutral. Text: The product is terrible. Sentiment: Negative Text: Super helpful, worth it. Sentiment: Positive Text: It doesn't work! Sentiment:"

Few-shot has limitations, like on reasoning tasks:

"Jim has 3 apples and ate 2. How many are left? Answer: 1."
"Jim has 3 apples and received 2 more. How many does he have? Answer: 5."

Chain-of-thought prompting:

You show the model how to break down the reasoning step by step:

"Jim has 3 apples and ate 2. How many are left? If Jim had 3 and ate 2, then we subtract 2 from 3, leaving us with 1. Answer: 1."

The core issue is LLMs don't just "follow instructions" the way users expect. They simulate by predicting tokens based on probability distributions shaped by their training data. So, if most LinkedIn posts in the data are generic, so are your model's outputs. No amount of "act as an expert" is going to overcome that baseline without a better way of influencing the model's internal context.

Because models can learn in-context from your prompt, providing examples leads them to mirror the patterns you've seeded. But as we've seen, examples alone don't guide the model to pay attention to the patterns we expect, for that we need to guide it. This is how chain-of-thought prompting can improve results for reasoning tasks. What if the same can be extended to other tasks?

Priming in Cognitive Science

Cognitive psychology offers a relevant concept: priming. Priming describes how exposure to stimuli can shape responses—often unconsciously. Classic studies by Meyer & Schvaneveldt and Neely showed how semantic priming activates related concepts, biasing our reactions and decisions. Even subtle cues, like the color of a room, can nudge people's choices by activating related associations.

It's simple enough to demonstrate:

"Penny, nickel, and ___" → Most people say dime.
"Three, two, ___" → You probably thought one.

These completions feel obvious because your brain's network of associations is primed by the initial context. Concepts like "penny" and "nickel" light up nearby nodes in your mental network, so "dime" surfaces as a thought.

LLMs behave similarly. Each next token is predicted based on proximity in a high-dimensional vector space. The point: both humans and LLMs infer and predict contextually, and both are susceptible to priming.

LLMs Are Susceptible to Priming—Proof by Example

If LLMs are susceptible to priming, we can test it with a simple hypothesis: priming the prompt with the word "cheeto" should bias the color choice toward orange.

Prompt 1

User:

give me a colour

GPT-4o:

#5E81AC
A deep, modern blue with a touch of gray.

Prompt 2

User:

cheeto. Give me a colour

GPT-4o:

Cheeto: a punchy, radioactive orange—think #FF7300.

Even with a small, unrigorous sample size, the difference is obvious: the initial word "cheeto" set the tone, context, and aesthetic for everything that followed. This isn't just "output"—it's priming in action. The first word shaped the model's predictions, steering the "spreading activation" in the embedding space toward orange.

Why Does This Work? Embeddings, Attention, and Spreading Activation

When you prompt an LLM, the text is mapped to vector embeddings—a way to numerically encode meaning and relationships in high-dimensional space. As the transformer processes input, each embedding passes through multiple layers and attention heads, with learned weights and biases transforming it at every step. By the end, you get a "contextualized" embedding—an evolved vector projected into vocabulary space, which the model uses to sample the most likely next tokens.

Crucially, the real black box isn't the architecture, but the weights and biases. Researchers can "lift the hood" to inspect these matrices, but the raw numbers reveal little about what they actually encode or why. The system is structured like any neural network, but interpreting what a specific set of weights means—or how it relates to semantic concepts—remains an open challenge.

What's increasingly plausible, given how well these models generalize and relate concepts, is that at least some of these weights and biases function as learned semantic networks. The embedding space—the high-dimensional landscape where each token lives—organizes relationships much like semantic networks. In other words, parts of the network may mirror how human brains organize concepts and associations, letting the model "spread activation" between related ideas when you prime the context window.

Key point: Every token in the context window affects how the transformer pays attention—how it weighs other tokens and the transformations it applies. Just as priming in human cognition can bias attention and decision-making, token placement in the context window can bias model predictions by activating relevant regions of embedding space.

Priming the Context Window

Priming is about biasing how the model interprets instructions. You don't just tell the model what to do; you steer how it's thinking when following instructions.

Example priming approach:

"Analyze Plato's writing style and tone."
"[Model outputs analysis]"
"Write a 300-word article about teamwork."

This activates the model's internal representation of style, cadence, tone, and rhythm—qualities that are difficult to define, but easy for the model to reproduce.

My working definition: Priming is whenever you introduce stimuli—text, cues, context—meant to bias the model prior to task completion. From this perspective, chain-of-thought is itself a form of priming for reasoning.

Why Priming Works

Priming works for a few key reasons:

It's not always obvious which tokens will trigger others, but you can "reverse engineer" associations by giving the model examples of your desired output and then prompting it to analyze those examples. The analysis can leak the features it associates with the style, which you can then reuse in your next prompt.
Asking it to "describe" a prompt surfaces descriptive properties—"describe" tends to focus on salient, visual, and explicit features, while "analyze" tends to surface underlying structures, patterns, and qualities, like directness, intent, and sentiment.
Few-shot alone doesn't always elicit the desired reasoning; you often need to prefill or prime the reasoning process. Similarly, examples alone may not surface a specific style—but asking the model to analyze a text before writing in that style can be a powerful primer. 'Write an article based on this style: [example]. Now write an article on X.'

Practical Takeaway

While most prompt engineering focuses on instruction and example, priming is about steering the embedding space within which models predict and select tokens.

You're not just providing instructions; you're guiding and biasing how the model interprets them.
The placement and choice of priming tokens directly influence which embeddings are activated, and thus which predictions are likely.

If you want above-average outputs—prime the context window with carefully selected and placed words.

This shift—from solely explicit instruction-based prompting to incorporating semantic priming—can unlock richer, more specific, and original LLM outputs.

For a deeper dive, see [references]. If you want hands-on techniques for priming, or examples for your workflow, reach out or check the next post.

References

Vaswani, A., et al. (2017). "Attention Is All You Need." arxiv.org/abs/1706.03762
Christiano, P., et al. (2017). "Deep Reinforcement Learning from Human Preferences." arxiv.org/abs/1706.03741
Ouyang, L., et al. (2022). "Training language models to follow instructions with human feedback." arxiv.org/abs/2203.02155
Brown, T.B., et al. (2020). "Language Models are Few-Shot Learners." arxiv.org/abs/2005.14165
Wei, J., et al. (2022). "Emergent Abilities of Large Language Models." arxiv.org/abs/2206.07682
Anthropic (2024). "Tracing Thoughts: How Language Models Plan Ahead." anthropic.com/research/tracing-thoughts-language-model
Elhage, N., et al. (2025). "Attribution graphs: Tracing language model computations beyond attention." Biology perspective. transformer-circuits.pub/2025/attribution-graphs/biology.html
Elhage, N., et al. (2025). "Attribution graphs: Tracing language model computations beyond attention." Methods perspective. transformer-circuits.pub/2025/attribution-graphs/methods.html
Liu, D., et al. (2023). "Lost in the Middle: How Language Models Use Long Contexts." arxiv.org/abs/2307.03172
Meyer, D.E., & Schvaneveldt, R.W. (1971). Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. Journal of Experimental Psychology, 90(2), 227–234. psycnet.apa.org/record/1971-00309-001
Neely, J.H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. Journal of Experimental Psychology: General, 106(3), 226–254. psycnet.apa.org/record/1978-20309-001
Collins, A.M. & Loftus, E.F. (1975). "A spreading-activation theory of semantic processing." Psychological Review, 82(6), 407–428. psycnet.apa.org/record/1975-00407-001

Priming the Context Window: A Cognitive Approach to Prompting