AI/ML Innovation Hub

System of Context vs Runtime Stitching: A Practical Comparison for AI Agents on Enterprise Data

By Snehil Kamal posted 2 days ago

  

If you are building an AI agent on enterprise data, one question decides almost everything else. Where does the agent get its data from?

Most teams skip past this question. They focus on the model, the prompt, and the orchestration. The data layer becomes an afterthought. Then the agent ships, and the trouble starts. Inconsistent answers across runs. Slow response times. Identity confusion that breaks downstream systems. Cost per task that does not scale.

The choice you make about the data layer shapes accuracy, latency, cost, governance, and the extent to which the agent can scale. It applies whether your agent is fully autonomous or has a human in the loop. It applies whether you are using OpenAI, Claude, Gemini, or anything else.

Two architectures keep showing up in the customer engagements I run. I will walk through both with one concrete task, then give you a verdict.

The Task

The task is a common one. Many of the Reltio customers I work with run something like it:

Given customer Acme Industries, summarize the relationship across all systems and recommend the next best action for the account team.

To answer this, the agent needs data from at least four places. CRM (Salesforce). Billing (NetSuite). Support (Zendesk). Marketing engagement (Marketo). Same customer. Different systems. Different identifiers. Different fields.

That is the simplified version. From the customer engagements I run, customer data tends to live across dozens of SaaS apps, often a hundred or more in the larger enterprises. The four-system case is the cleanest version of runtime stitching. The real case is much worse, and that matters for everything that follows.

Same prompt. Same model. Two architectures for getting the data.

Architecture A: Runtime Stitching

In this architecture, the agent queries every source system live, one at a time, and stitches the answer together at runtime.

The agent flow looks like this:

image
  1. Search Salesforce for "Acme Industries". Get back two matches.

  2. Search NetSuite. Get back one match, different spelling.

  3. Search Zendesk. Get back three tickets, two with slightly different account names.

  4. Search Marketo. Get back five contacts under "Acme Inc".

  5. Try to figure out which of these is the same Acme. Decide what to merge, what to drop.

  6. Build a merged view.

  7. Generate the recommendation.

The agent is doing identity resolution itself, mid-prompt, on the hot path of the user request.

What works:

  • No upfront investment. Wire the agent to the systems and go.

  • Always live data. No staleness.

What breaks:

  • Accuracy degrades fast. Identity resolution is a hard problem. The LLM is not a match engine. It will confidently merge two different Acmes that happen to share an address, or split one real Acme into two because the suffix is different. There is no audit trail of how it was decided.

  • Latency is brutal. Four to seven sequential calls before the agent can even start reasoning. A 30-second response is normal.

  • Cost compounds. Every run pays the full stitching cost again. Identity resolution happens fresh for every prompt.

  • Inconsistency between runs. Ask the same question twice, get two different summaries, because the agent stitched the second time differently.

  • Governance is invisible. No record of what data the agent saw or how it combined it. When a regulator asks why the agent recommended a course of action, the trail is not reproducible.

Variant: RAG over indexed source data

Teams that hit the latency and cost walls above often retreat to a middle ground. Index extracts from each source system into a vector store, retrieves at query time, and lets the agent stitch from chunks instead of live API calls. This is RAG over source data.

It helps with latency and cost. It does not help with identity. Embeddings can tell you that "Acme Inc" and "Acme Industries Ltd" are similar strings. They cannot tell you they are the same legal entity, or that they should be, given how your business defines "same". You still need a match engine. You just hid it behind a vector index.

image

Three problems specifically:

  • Embeddings drift from truth. Index updates lag the source. The agent confidently returns last month's address because the chunk is stale.

  • Identity is not similarity. A vector store will happily return "Acme Plumbing" when the user asks for "Acme Industries", because the embeddings are close.

  • Lineage gets lost. You retrieve chunks, not records. There is no clean trail back to a system of record for an audit.

RAG over documents works well. RAG over master data is the wrong tool for the job.

This architecture, whether the live or the RAG variant, works for prototypes. It rarely survives the first production review.

Architecture B: System of Context

In this architecture, the agent does not stitch. It asks one place that already knows.

A System of Context, in this case Reltio, has done the identity resolution upfront. It maintains:

  • A single resolved profile for Acme Industries, with every variant (Acme Inc, Acme Industries Ltd, Acme USA) linked under one ID.

  • Source attribution on every attribute, with origin system, last-updated timestamp, and crosswalk lineage.

  • Match scores from the platform's resolution engine (in Reltio's case, FERN), with a record of how the entity was assembled.

  • Live updates as source systems change, through streaming.

The agent flow becomes:

  1. Query Reltio for the resolved profile of Acme Industries.

  2. Get back one entity, with all attributes, all sources, all timestamps, and match confidence.

  3. Generate the recommendation.

image

What works:

  • Accuracy is consistent. Identity is resolved by a platform designed for it, not by the LLM. The agent never has to guess which Acme is which.

  • Latency drops. One call instead of seven. Sub-second profile retrieval.

  • Cost is predictable. Identity resolution cost is paid once, in the MDM platform, not on every prompt.

  • Consistency across runs. Same question, same answer. The merged view is stable.

  • Auditable by design. Every recommendation can be traced back to specific source values and match decisions. Regulators get a clean trail.

What you pay for:

  • An MDM investment upfront. Not trivial.

  • A dependency on how fresh the System of Context is. If a source system updated five minutes ago and the MDM has not ingested it yet, the agent sees yesterday's value. Streaming addresses this, but it is a thing to design for.

Where does Reltio AgentFlow fit?

This is not theoretical. Reltio AgentFlow is built on exactly this architecture. AgentFlow is Reltio's conversational interface to your tenant, built on the Model Context Protocol. It ships with prebuilt agents (Data Explorer, Resolver, Address Enricher, Profiler, Unmerger, Work Assigner, Product Recommender, Segmenter) that operate on the same resolved profile, lineage, and match scores described above. The agents reach the data through the Reltio AgentFlow MCP Server, which enforces role-based access, data masking, and audit policies on every call. Agent Builder lets you author your own agents in the same governed environment, and Bring Your Own LLM lets you keep the model under your AI strategy.

The point is not that AgentFlow is the only way to do this. The point is that the System of Context pattern is buildable today, with a working reference that handles the governance and lineage you would otherwise have to engineer yourself.

This is the architecture that scales beyond a pilot.

Cost in Real Numbers

Token cost is the dimension most teams under-estimate. The per-token price feels small. The math stays invisible until the bill arrives.

Rough estimate for our Acme example, using approximate frontier-model pricing ($3 per million input tokens, $15 per million output tokens). Treat the numbers as illustrative, not precise.

Architecture A: Runtime Stitching

  • System prompt + agent instructions: ~1.5K tokens
  • Raw JSON from four source systems: ~25K tokens
  • Identity resolution reasoning and retries: ~10K tokens
  • Total input per request: ~36K tokens
  • Output: ~1.5K tokens
  • Approx cost per request: $0.13

Architecture B: System of Context

  • System prompt + agent instructions: ~1.5K tokens
  • Resolved profile, sources, and lineage from Reltio: ~5K tokens
  • Total input per request: ~7K tokens
  • Output: ~1.5K tokens
  • Approx cost per request: $0.04

The per-request gap looks small. At scale, it stops being small.

Workload Architecture A Architecture B
1,000 requests/day ~$3,900/month ~$1,200/month
10,000 requests/day ~$39,000/month ~$12,000/month

Your actual numbers will vary with model choice, prompt design, prompt caching, and how chatty your source systems are. The shape of the curve does not change. Runtime stitching scales linearly with the number of systems and the data they return. A System of Context does not.

Side-by-Side

Dimension Runtime Stitching System of Context
Identity resolution LLM guesses, no audit Platform-resolved, scored
Latency per request 10 to 30+ seconds (multiple calls) Sub-second (one call)
Cost per request Pays the stitching cost every time Pays it once, in the MDM
Consistency across runs Inconsistent Stable
Auditability None Full lineage
Governance Invisible Built in
Time to first prototype Fast Slower
Production scale Breaks Scales
Live data freshness Always live Depends on ingestion (streaming closes the gap)

System of Context wins on every dimension that matters once you move past the prototype.

The Verdict

Runtime stitching is what you build when you do not have a System of Context. Not because it is better. Because it is what you have.

If your agent is a demo, a prototype, or a one-off internal tool, runtime stitching might be fine. You will rebuild it later.

If your agent is going to production, especially in a regulated industry, and with any kind of write-back, runtime stitching will not survive. The accuracy will erode trust. The latency will erode adoption. The lack of an audit trail will erode the security review.

The real trade-off is not "runtime stitching vs. System of Context". It is "build the System of Context now, or build it later under pressure, while customers are already complaining about the agent."

The same logic applies whether the agent is fully autonomous or has a human in the loop. A fully autonomous agent has nowhere to hide. Every decision needs to be traceable, every action reversible, every fact tied to a source. None of that is possible with runtime stitching at scale. A human-in-the-loop agent buys you time on the audit trail but pays the same accuracy, latency, and cost penalties.

If you are evaluating whether to invest in master data for your AI agents, the question is what it costs to delay, not whether to do it.

Takeaways

The model gets the headlines. The data layer decides whether the agent actually works.

A few things to take away:

  • Runtime stitching is fine for prototypes. It does not scale to production.

  • A System of Context, whether Reltio or anything else built for the same purpose, gives the agent three things that make its output trustworthy: resolved identity, source lineage, and match confidence.

  • The trade-off is not accuracy vs. freshness. Streaming closes the freshness gap. You can have both.

  • The choice is not blocked by autonomy. The data layer matters whether you have a human in the loop or not. Autonomy raises the stakes. It does not change the architecture.

If you are starting on an agent today, sketch both architectures on a whiteboard before you write any code. Be honest about which dimensions you are willing to compromise on in production. Most teams find out the hard way that "we will fix the data layer later" is the most expensive sentence in the project.

Then build for the architecture that survives the production review, not the demo.

0 comments
11 views

Permalink