Long-Term Memory

TL;DR Long-term Memory is the last bottleneck to human-level AI

Nov 03, 2025

Reproducing functional long-term memory is the last bottleneck to human-level AI. To see why, look at where current models fall short. Models complete short tasks better than humans; they only fail on long-horizon and large-context tasks. This is why models still can’t replace anyone’s job.

We can look at the remaining gap between models and humans to figure out how to enable long-horizon tasks:

Models can’t learn as they interact with the world like humans can (online learning)
Models have a small limit on how much context they can hold over long periods of time (memory)

Long-term memory (LTM) solves both problems. At its core, long-term memory is just read-write access to a persistent database. Models can already do this, so what’s the remaining difference?

Models can’t learn based on their memories, because database access is non-differentiable. The entire problem of LTM is how to let models learn to use this database optimally. The model must learn storage to make retrieval much easier (which is why current methods fail: they store everything).

The entire problem of memory is

Knowing what to store
Being able to access the relevant information.

Both of these are downstream of one thing: a complex, dynamic, context-dependent notion of “what matters”. So the real bottleneck to LTM is creating a differentiable database read/write operation. This would let the model learn, as humans can, to constantly update, synthesize, and process the contents of that database based on an improving notion of what is important.

LTM is exactly online learning in humans. In-context learning (models conditioning input on previous information) is online learning, the context being the information of one’s whole life! RL is useful but unideal: it is informationally inefficient and low-signal.

How do we solve long-term memory?

We let the model learn itself by giving a differentiable database access operation (read and write). This is so the model learns itself emergently the notion of what information matters and what to access when. We can do this differentiable database access in the form of a persistent extra set of weights, a “memory module” with which the “main” model weights can interact in a differentiable way.

For example, the main model could produce a vector that is inputted to the memory network, and the memory network outputs a vector taken in by the main model. In the limit of infinite compute, we would just have the entire network, including the main model, be persistent, but this is inefficient for storage costs, hence the split.

In this setup, the problems to solve are now:

Designing an inductive bias for the architecture of the memory module that is optimal for efficient information storage,
Choose an optimal method for the main model and memory module to interact, which optimizes efficient gradient descent

Once these criteria are solved, models should be able to learn online and operate over long contexts. They will be able to perform real jobs. Admittedly, interaction with the real world is yet another difference, but this will come soon after. The last reason we don’t fully think of AI as a “being” or real is that it has no constant stream of awareness of interaction with the real world. Once a persistent stream is achieved, they will be indistinguishable from biological intelligent beings.

Manifestations of AlmondGod

Discussion about this post