top of page

LLMs are incredibly capable, but they’re also incredibly forgetful.


A man in a suit interviews a robot holding a microphone. Robot says, "I'm sorry... what were we talking about?" Setting is simple and drawn.

Stateless by design, powerful pattern-matchers, but limited by their context window.


And while most teams focus on prompts, tools, or retrieval strategies (RAG)…

the real challenge is managing memory.


Because if your application spans more than a single turn (meaning if it’s multi-step, user-specific, or agent-based) then context alone isn’t enough.


You need memory.


And most teams don’t design for it at all.



Context windows are not memory


LLMs don’t remember anything from one call to the next.


All they have is the context window, which is a limited number of text you pass in with each prompt. Think of it as short-term attention, not long-term knowledge.


Yes, context windows are getting bigger. Some are over 1M tokens.


But here’s the problem:

More context doesn’t mean better results.

Why? Because:

  • LLMs don’t “search” the prompt, they process it all sequentially

  • The more you stuff in, the more diluted and distractible the model becomes

  • Critical information gets lost in the noise, especially when topics shift mid-session


And worst of all?


We’re seeing growing evidence that response accuracy plateaus, or drops, once you go past a certain point, unless you’re actively managing what gets included.



Real-world impact: When your app starts to drift


Here’s what happens when context isn’t managed properly:

  • An agent forgets which dashboard it was analyzing

  • A data assistant references the wrong version of a metric

  • A pipeline builder gets confused between two tables it mentioned 15 messages ago

  • The model starts hallucinating just to fill in the blanks


The user thinks: “Wait! Weren’t we just talking about something else?”


That’s not just a UX bug. That’s a memory failure.



So what does memory actually mean?


Memory isn’t one thing, it’s a layered system for managing what the model should remember, retrieve, and reuse across turns.


Here are some layers that I think are helpful:


1. Turn Buffer – Short-term memory (last N messages)


2. Rolling Summary – A dynamic recap of what matters so far


3. Entity Memory – Normalized facts about users, datasets, models, policies


4. Scratchpads – Temporary working memory scoped to a task or artifact


5. Long-Term Memory – Durable facts, preferences, saved work


6. Knowledge Memory – RAG over docs, schemas, logs, and past outputs


You don’t need all of them in every app.

But if your system doesn’t remember anything between turns then it won’t scale beyond a demo.



Why bigger isn’t better


There’s a myth that “context solves memory.”

Line graph showing accuracy vs. number of documents. Six lines decrease as documents increase. Legend lists different models.
Source: “Lost in the Middle: How Language Models Use Long Contexts”, F. Liu et al. 2023.

It doesn’t.

And worse, overloading context can actually reduce performance.


Stanford’s “Lost in the Middle” study shows this. That accuracy drops when an answer is buried amid distractors, even in models with extended context.


Just because a model can read 10, 100 or 1000 pages…

doesn’t mean it will pay attention to the one that matters.


Which is why you need retrieval policies, entity filters, task-specific scratchpads, and context allocators that decide what’s actually relevant for the current turn.



Memory Needs Structure


Great memory design isn’t about clever prompting, it’s about a solid architecture behind the scenes.


Here’s a minimal data model that I've found useful:


sessions(id, user_id, created_at, summary_text, summary_updated_at)
messages(id, session_id, role, text, created_at, embedding)
entities(id, type, name, attributes JSONB, last_seen_at)
artifacts(id, session_id, type, uri, metadata JSONB)  // e.g., SQL, DAG YAML
long_term_memory(user_id, key, value, expires_at)
chunks(id, source_uri, text, embedding, metadata)     // for RAG retrieval

Each maps directly to a memory layer:

Memory Layer

Backed By

Turn buffer

messages

Rolling summary

sessions.summary_text

Entity memory

entities

Scratchpad / tools

artifacts

Long-term memory

long_term_memory

Knowledge (RAG)

chunks


Retrieval and Update Strategies That Work


Designing memory isn’t just about what you store, it’s about how you retrieve and refresh.


Here’s what works:


Retrieval policy:

  • Last 8 turns of conversation (messages)

  • Rolling session summary (summary_text)

  • Top-k semantic hits from messages and chunks

  • Entities explicitly mentioned in the user message

  • Active scratchpad for the current task (artifacts)


Update policy:

  • Append to messages after each turn

  • Refresh summary_text when topic shifts

  • Extract and upsert structured entities

  • Store any generated artifacts and index for RAG


This is the glue that makes memory useful, not just available.



The agentic pattern: memory as architecture


If you’re building agentic apps like copilots, planners, assistants, explainers, memory isn’t a nice-to-have.


It’s the foundation that enables:

• Persistent task state

• Multi-turn collaboration

• Personalization over time

• Referencing prior decisions

• Reducing hallucination and drift


Put the right information in front of the model at the right time.


Nothing more, nothing less.



Final thought


Building LLM apps without memory is like trying to collaborate with someone who forgets everything you said two minutes ago.


Yes, prompts matter.

Yes, tools are powerful.

But if you want your AI to behave more like a teammate than a chatbot?


Memory isn’t just a nice to have feature.

It's a priority from day one.



At Fuse, we believe a great data strategy only matters if it leads to action.


If you’re ready to move from planning to execution — and build solutions your team will actually use — let’s talk.


A recent MIT study found that 95% of enterprise GenAI pilots are failing to deliver measurable impact.

Two people stand puzzled by a tangled line labeled "AI Pilot" on an easel. One says, "Maybe we should define the project first."

Not because the models are broken, but because the delivery process is.


Despite the buzz, the billions in investment, and the proliferation of copilots and pilots, most AI projects are going nowhere.


The MIT study, “The GenAI Divide,” uncovered why:


“Only 5% of generative AI pilots have achieved measurable value to P&L."


The rest? Underperforming or stuck in endless experimentation.”


So what’s going wrong?


Let’s break down the key findings and show how our Define-to-Deliver model directly addresses them.



MIT Finding 1: Projects lack clear use cases and measurable business goals.


Translation: No one defined the problem.


This is where most AI projects fail before they start. A team spins up a pilot because “we should try GenAI”, but no one has articulated:

• What real-world problem it’s solving

• Why it matters to the business

• How we’ll know if it worked


✅ Define → Deliver Response:


In the Define phase, we start with:

• Business context

• User pain

• “What does success look like?”

• One-sentence problem framing everyone can agree on


If your project can’t pass that bar, it’s not ready for AI or any investment.



MIT Finding 2: Most companies plug GenAI into workflows without rethinking how the work actually happens.


Translation: They’re optimizing broken processes.


Even good AI tools won’t deliver value if you simply bolt them onto existing reporting pipelines or legacy ticketing systems.


✅ Define → Deliver Response:


The Design phase fixes this.


We don’t build on top of old process debt, we co-create new workflows that are:

• Focused on the real need (not just features)

• Designed with the business, not just for them

• Prioritized using MoSCoW to keep V1 realistic and relevant


When you design with the people who’ll use it, adoption stops being a problem.



MIT Finding 3: Executives over-index on customer-facing use cases instead of internal operations, where ROI is often higher.


Translation: Misaligned priorities.


Chasing demos and dashboards for customers might look good, but the real impact often comes from back-office improvements — process automation, internal workflows, internal decision support.


✅ Define → Deliver Response:


Our Align phase brings the right people together, not just to score ideas, but to prioritize based on:

• Business value

• Readiness

• Capacity to co-own the solution


We don’t just ask, “What’s exciting?”

We ask, “What’s executable?”

That’s what keeps initiatives moving and valuable.



MIT Finding 4: Pilots are run in isolation, without business ownership or delivery accountability.


Translation: No one’s actually responsible.


AI becomes someone’s side project. There’s no ongoing feedback loop, no path to scale, no learning built into the process.


✅ Define → Deliver Response:


In our Deliver phase, we solve for:

• Shared ownership

• Weekly feedback loops

• Embedded business champions

• Outcome measurement (not just feature delivery)


We don’t deliver “to” the business. We deliver "with" the business.



MIT Finding 5: Tools improve fast. Organizations don’t.


Translation: AI is evolving faster than the org chart.


Many teams can now generate the work (SQL, content, visuals) instantly. But it still takes weeks or months to ship anything because workflows, governance, and trust haven’t caught up.


✅ Define → Deliver Response:


Our entire methodology is built to match the speed and flexibility that modern tooling enables.


Working in 3-month delivery cycles, we define success up front, measure real usage, and adapt quickly based on feedback. This builds momentum and trust and avoids the “pilot purgatory” most GenAI efforts fall into.



Final Thought


The problem isn’t GenAI.


It’s the same problem data teams have faced for years:

• No clear definition

• No business alignment

• No thoughtful design

• No accountable delivery


Define → Deliver isn’t a framework for AI.

It’s a framework for value.


And it’s never been more relevant than right now.



At Fuse, we believe a great data strategy only matters if it leads to action.


If you’re ready to move from planning to execution — and build solutions your team will actually use — let’s talk.


The data team of 2030 won’t be a team at all.

People at desks labeled Data Operations, Marketing, Operations, Product surround an empty Data Team desk with spiderwebs. Text: The Data Team Isn't Gone—It's Everywhere.

It will be a network. Distributed. Embedded. Everywhere... and nowhere.


For years, companies have treated data like a function to be scaled:


  1. Build a centralized team.

  2. Hire engineers.

  3. Add analysts.

  4. Create a center of excellence.


But the more they scale, the more things slow down.


The more people they hire, the more tickets, projects and roadmaps pile up.

And the more advanced the tooling gets, the harder it becomes to answer a simple question.


Why? Because the structure is broken.


When you centralize data, you separate it from the context it needs to be useful.


You end up with:

• A data team solving problems it doesn’t fully understand.

• Business users submitting tickets for insights they needed yesterday.

• Decisions delayed because the system was designed around service, not speed.


It’s not a resourcing problem.

It’s an operating model problem.



The problem.


Most business functions have central teams, like finance, HR, legal, and IT.


But in well-run organizations, they don’t operate as distant silos. They function through embedded partnerships.


Take finance.


Yes, there’s a central finance org. But large business units have embedded finance partners s who understand the goals, pressures, and trade-offs within a specific team. They don’t just process budgets. They help shape them.


HR works the same way.


You don’t ask a centralized HR team to solve every people issue. You embed HR business partners into teams to handle hiring, performance, and team dynamics, in context.


Even legal — one of the most centralized functions — embeds specialists for commercial, product, or privacy work when it matters most.


And of course, product and engineering teams left centralization behind years ago.

They embed product managers, designers, and engineers into cross-functional squads aligned to domains and customer outcomes (onboarding, growth, retention, etc.)


So why is data still so centralized and so far from the decisions it’s meant to inform?



The future is embedded, not centralized.


The best data-driven organizations are moving in a new direction.


They’re shifting from a centralized service model to a distributed capability model.


Instead of one big data team serving the company, they embed:

• Analysts in marketing, sales, and operations.

• Data engineers aligned to product teams and domains.

• Governance stewards inside the business, not hovering above it.


The central team of the passed becomes repurposed to:

• Build and maintain shared infrastructure.

• Define lightweight standards and best practices.

• Create tooling, templates, and training.

• Build trust, not bottlenecks.


Think of it like an operating system.


Not doing the work itself, but enabling others to do it faster, better, and in context.



The impact of getting this right.


The result?

• Faster, more relevant decisions made closer to the front lines.

• Stronger ownership of data quality and outcomes.

• Higher leverage from central teams, who become enablers — not gatekeepers.


The centralized data team isn’t evolving.

It’s dissolving into something better.


Not a team.

A capability.

Woven into the fabric of the business.


This isn’t the end of data.

It’s the beginning of doing it right.



At Fuse, we believe a great data strategy only matters if it leads to action.


If you’re ready to move from planning to execution — and build solutions your team will actually use — let’s talk.


fuse data logo
bottom of page