Beyond Retrieval

February 7, 2026

Your knowledge system treats a note from three years ago the same as something you confirmed yesterday. Same confidence, same weight, same retrieval priority. It doesn't care when you last verified it. It just sits there, equally sure about everything, forever.

Retrieval itself is good. Search, ranking, entity resolution, RAG, compaction, all of that is useful. Staleness is still under-modeled. Most systems can tell you what looks relevant. They are much worse at expressing something like: "I used to believe this strongly, but I'm less sure now."

So we compensate. We gather dates and surrounding context, hand them to a model, and ask it to infer whether something is stale, contradicted, or newly important. Sometimes that's good enough. But it's still a workaround. You're constrained by context windows, you're hoping you fetched the right material, and the model is doing reasoning that the system should often be able to do itself. Whether a belief has gone stale, whether an expectation failed, whether two beliefs are in tension: I'd rather those be first-class system mechanics than prompt-time guesses.

The semantics matter here, because these words get collapsed too easily.

When I say memory, I mean stored traces: notes, messages, events, observations, documents. When I say knowledge, I mean the system's current working model of the world. Beliefs are the explicit claims inside that model, each with some degree of confidence and some basis in evidence. Expectations are what those beliefs imply should happen next. Contradictions are the places where the model no longer fits together cleanly.

If we call all of this "memory," we end up blurring storage, inference, confidence, and prediction into one bucket. A lot of the confusion around these systems starts there.

None of this is new on its own. What's still missing is a runtime system that keeps these ideas explicit enough to query, update, and compose.

There's a lot of good work in this area already. Recent memory systems are getting better at layered memory, compaction, retrieval, and temporality. That's useful, and some of it gets close to what I want. But the center of gravity is still usually recall, personalization, or context management. I'm after something slightly different: a system that can tell when one of its own beliefs is getting weak, stale, or directly challenged.

Confidence should weaken over time without reinforcement. A belief can still be likely while becoming less trustworthy. Probability and confidence are not the same thing. Something can remain 90% likely while having low precision because it has gone unverified for too long. A lot of systems flatten those into one score and lose the distinction between "I'm confident this is true" and "this used to seem true and I haven't checked recently."

A knowledge system doesn't just store facts. It also holds expectations about what should happen next: a client should respond within a certain window, a commitment should get followed up on, a flow meter should stay in range, a weekly pattern should continue until something changes. When the expected thing does not happen, that's a different kind of signal from passive staleness. Now the system has prediction failure, not just age.

These mechanics also don't care much where the signals come from. A client changing communication patterns, a calendar commitment slipping, a project going quiet, a metric drifting out of its usual range, those all generate beliefs and expectations in roughly the same way.

Then there's the harder part: reasoning about absence. You notice when someone stops texting. You notice when a number flatlines. You notice when the thing that usually happens doesn't. Most systems don't represent that very well because the space of things that did not happen is effectively infinite. So the system needs scope. It needs a hot set: live expectations derived from current beliefs, active commitments, and patterns that matter right now. As beliefs decay and commitments resolve, expectations expire naturally. The system only watches what it currently has reason to watch.

This is also a search problem. You don't want to traverse everything every time something changes. Expectations, hot sets, and structural tension give the system shortcuts. Instead of brute-forcing the whole belief set, it can focus attention on the parts that are active, unstable, or no longer fitting together cleanly.

Once you have that, the violations themselves carry information. Something overdue is different from something contradicted, which is different again from something simply going missing. A task that didn't get done by Friday suggests one kind of response. A belief about a client that is directly contradicted by new evidence suggests another. A long-running pattern that just stopped is different again. The point is not just to get less certain. The point is to know why certainty changed.

If the system believes a client prefers async communication, that belief might sit for a year and slowly lose precision. Maybe it moves into the hot set because it is worth re-verifying. Meanwhile the client starts scheduling video calls every week. That is not just passive decay anymore. It is a contradiction. The system shouldn't merely drift toward uncertainty. It should be able to say: this belief is not just old, it is being actively challenged.

The other failure mode is structural tension. Add enough beliefs and they start pulling against each other. The system thinks you prefer focused solo work, but also that you love pairing on hard problems. Both are grounded in recent evidence. Both may be partly true. But together they suggest a missing variable, a hidden condition, or a belief that is too coarse to survive contact with new evidence.

You can represent beliefs and evidence in a graph, and graphs are useful. But a graph by itself is not enough for the behavior I want. It doesn't automatically model staleness. It doesn't maintain active expectations. It doesn't tell you which parts of the system are over-constrained and starting to fight.

I'm not talking about training a model on a user and calling it done. Models can absolutely absorb preferences, patterns, and negative evidence. That's not the issue. The issue is that once those things disappear into weights, they're harder to inspect, harder to update explicitly, and harder to reason about at the level of specific beliefs, expectations, and contradictions.

What I want is a knowledge layer that stays legible. A system that can represent: this used to seem true, this prediction failed, these two beliefs are in tension, this is worth re-checking now. Not just because that's easier to debug, but because it makes the system more queryable and more composable. You can ask why something changed. You can trace what evidence supported it. You can decide whether to revise a belief, split it into something more specific, or drop it entirely.

What makes sense to me is pulling these primitives together into small domain-level systems. You draw a boundary around a set of beliefs, evidence, expectations, and contradictions, and now you have a working substrate for some part of the world: a person, a machine, a team, a process.

Then those substrates can compose. A small system can stand on its own or become part of a larger one. A system that tracks one project can feed into a broader view of team health and planning. A system that models one person's patterns can become one component inside a larger personal or organizational knowledge system. It's the same primitives holding together at different scales.

Most systems stop at retrieval. They help you find relevant things. What I want is a system that can also tell you when one of its beliefs is getting stale, when one of its expectations failed, and when two parts of its worldview no longer fit together cleanly. That's a different layer of capability.