LLMs are incredibly capable, but they’re also incredibly forgetful.

Stateless by design, powerful pattern-matchers, but limited by their context window.
And while most teams focus on prompts, tools, or retrieval strategies (RAG)…
the real challenge is managing memory.
Because if your application spans more than a single turn (meaning if it’s multi-step, user-specific, or agent-based) then context alone isn’t enough.
You need memory.
And most teams don’t design for it at all.
Context windows are not memory
LLMs don’t remember anything from one call to the next.
All they have is the context window, which is a limited number of text you pass in with each prompt. Think of it as short-term attention, not long-term knowledge.
Yes, context windows are getting bigger. Some are over 1M tokens.
But here’s the problem:
More context doesn’t mean better results.
Why? Because:
LLMs don’t “search” the prompt, they process it all sequentially
The more you stuff in, the more diluted and distractible the model becomes
Critical information gets lost in the noise, especially when topics shift mid-session
And worst of all?
We’re seeing growing evidence that response accuracy plateaus, or drops, once you go past a certain point, unless you’re actively managing what gets included.
Real-world impact: When your app starts to drift
Here’s what happens when context isn’t managed properly:
An agent forgets which dashboard it was analyzing
A data assistant references the wrong version of a metric
A pipeline builder gets confused between two tables it mentioned 15 messages ago
The model starts hallucinating just to fill in the blanks
The user thinks: “Wait! Weren’t we just talking about something else?”
That’s not just a UX bug. That’s a memory failure.
So what does memory actually mean?
Memory isn’t one thing, it’s a layered system for managing what the model should remember, retrieve, and reuse across turns.
Here are some layers that I think are helpful:
1. Turn Buffer – Short-term memory (last N messages)
2. Rolling Summary – A dynamic recap of what matters so far
3. Entity Memory – Normalized facts about users, datasets, models, policies
4. Scratchpads – Temporary working memory scoped to a task or artifact
5. Long-Term Memory – Durable facts, preferences, saved work
6. Knowledge Memory – RAG over docs, schemas, logs, and past outputs
You don’t need all of them in every app.
But if your system doesn’t remember anything between turns then it won’t scale beyond a demo.
Why bigger isn’t better
There’s a myth that “context solves memory.”

It doesn’t.
And worse, overloading context can actually reduce performance.
Stanford’s “Lost in the Middle” study shows this. That accuracy drops when an answer is buried amid distractors, even in models with extended context.
Just because a model can read 10, 100 or 1000 pages…
doesn’t mean it will pay attention to the one that matters.
Which is why you need retrieval policies, entity filters, task-specific scratchpads, and context allocators that decide what’s actually relevant for the current turn.
Memory Needs Structure
Great memory design isn’t about clever prompting, it’s about a solid architecture behind the scenes.
Here’s a minimal data model that I've found useful:
sessions(id, user_id, created_at, summary_text, summary_updated_at)
messages(id, session_id, role, text, created_at, embedding)
entities(id, type, name, attributes JSONB, last_seen_at)
artifacts(id, session_id, type, uri, metadata JSONB) // e.g., SQL, DAG YAML
long_term_memory(user_id, key, value, expires_at)
chunks(id, source_uri, text, embedding, metadata) // for RAG retrieval
Each maps directly to a memory layer:
Memory Layer | Backed By |
Turn buffer | messages |
Rolling summary | sessions.summary_text |
Entity memory | entities |
Scratchpad / tools | artifacts |
Long-term memory | long_term_memory |
Knowledge (RAG) | chunks |
Retrieval and Update Strategies That Work
Designing memory isn’t just about what you store, it’s about how you retrieve and refresh.
Here’s what works:
Retrieval policy:
Last 8 turns of conversation (messages)
Rolling session summary (summary_text)
Top-k semantic hits from messages and chunks
Entities explicitly mentioned in the user message
Active scratchpad for the current task (artifacts)
Update policy:
Append to messages after each turn
Refresh summary_text when topic shifts
Extract and upsert structured entities
Store any generated artifacts and index for RAG
This is the glue that makes memory useful, not just available.
The agentic pattern: memory as architecture
If you’re building agentic apps like copilots, planners, assistants, explainers, memory isn’t a nice-to-have.
It’s the foundation that enables:
• Persistent task state
• Multi-turn collaboration
• Personalization over time
• Referencing prior decisions
• Reducing hallucination and drift
Put the right information in front of the model at the right time.
Nothing more, nothing less.
Final thought
Building LLM apps without memory is like trying to collaborate with someone who forgets everything you said two minutes ago.
Yes, prompts matter.
Yes, tools are powerful.
But if you want your AI to behave more like a teammate than a chatbot?
Memory isn’t just a nice to have feature.
It's a priority from day one.
At Fuse, we believe a great data strategy only matters if it leads to action.
If you’re ready to move from planning to execution — and build solutions your team will actually use — let’s talk.