Pathway's 'Baby Dragon': A New AI Architecture That Learns Like a Brain |

Translate: 🇫🇷 French 🇸🇦 Arabic 🇨🇳 Chinese 🇪🇸 Spanish

We believe we’re on a faster way to AGI. This is a journey into a new paradigm, one where AI is not static but alive, constantly learning and adapting. At the heart of this revolution is a simple, yet profound, principle: whenever two neurons are interested by something, the connection between them becomes stronger. This is memory. We saw the emergence of a brain appearing from this principle, a moment of spontaneous order that signals a new architectural leap in artificial intelligence. We can now glue two separately trained models together, and they become one.

From Politics to Complexity Science

The path to this discovery was unconventional. It began not in a computer science lab, but in a French school for politicians. A chance encounter with a game theory course, inspired by the film A Beautiful Mind, revealed a natural intuition for the subject. The beauty of seeing the results of games without complex math was a revelation. It felt like coming home.

This newfound passion led to a master’s degree specializing in game theory on graphs. This field quickly evolves into complexity science. You start with small particles and observe how they form big structures. It becomes more interesting when that structure constantly changes, an infinitely growing and evolving system. This is a tricky problem, but one that physicists and mathematicians have been trying to crack for ages.

At its core, you have small particles interacting—sometimes bumping into each other in space, sometimes forming connections in a graph, like neurons passing signals. These tiny interactions give rise to massive phenomena: a society, intelligence, you name it. The underlying math starts to look surprisingly similar. This journey through complex systems, trying to understand how global phenomena emerge from local interactions, laid the groundwork for a fundamental question: what is AI missing?

The Flaw in Current AI: The Absence of Time

When you have a system that grows and changes, you need a notion of time. If things are to evolve and emerge, they need time. Modern IT systems, and especially current AI, are deprived of this crucial element.

All the mainstream models today are built on a single architecture: the transformer. It was an absolute algorithmic breakthrough, fundamentally designed for language. However, this technology is, by definition, deprived of the notion of time and memory. This is the problem we set out to solve.

We are building the first post-transformer frontier model to tackle this fundamental lack of memory in AI. Memory is intrinsically linked to time. You need to remember things over time to understand consequences, to maintain coherence, and to solve problems. The longer you can stay focused on a task, the more memory you require.

A lab called Metr that benchmarks the capabilities of LLMs found that the current limit for performing a human-level task with a 50% success rate is around 2 hours and 17 minutes for a GPT-4-level model. In essence, current LLMs are reliving Groundhog Day. Every single day.

They don’t have true memory. They are trained once on a massive dataset, and every time you interact with one, it’s as if the model wakes up using the same static brain it was given during training. You can provide context in a prompt, but this is like leaving sticky notes for yourself. There’s a vast difference between having a library of knowledge and having truly internalized it to create adaptable frameworks for new situations. This is the difference between a static brain and one with contextualized, evolving memory.

The Epic-Cycle Problem

Trying to trick a transformer into having memory is a tiresome and difficult task. It’s like the epic-cycles before Copernicus. To explain the observed movement of the moon, astronomers designed complex, ugly orbits. They were cumbersome, but they worked, and every small improvement was celebrated. But sometimes, you just need to change your perspective. Swap things around, and the orbit becomes simple and elegant.

We believe we need to roll back some assumptions. The transformer was an amazing innovation that opened the entire market and captured the public’s imagination. But we are still in the very early days of this AI market shift. So far, only 0.7% of GDP has been spent on this technological transition. For comparison, the telecom shift in the ’90s took over 2% of GDP. AI is arguably more fundamental. The transformer is likely not the ultimate technology to get us all the way there. We need something else.

Introducing ‘Baby Dragon’ (BDH)

This brings us to our new architecture, which we call Baby Dragon Hatchling, or BDH. The name itself is a nod to the challenge. “Dragon Hatchling” comes from Terry Pratchett’s The Colour of Magic, where dragons appear the more you think about them. This felt fitting for a reasoning model, as we literally had to reason about reasoning to build the architecture. The “B” in BDH? Honestly, it’s because AI researchers love three-letter acronyms.

But the name also carries a deeper meaning. We are in the business of building dragons—mythical creatures nobody believed could exist. We are talking about continual learning, long-horizon reasoning, and adaptation over time to new data and new learnings.

How BDH Works: Learning Like a Brain

The way it works is a bit like a brain on silicon. It’s based on a concept called Hebbian learning, a simple principle of how our own brains function.

Imagine the brain as a network of neurons (dots) and synapses (links between them). That’s the basic model. We don’t need to know every chemical reaction, just as we don’t need to know the bone structure of a bird’s wing to understand that wings are for flying.

Our BDH architecture is built on this idea of local interactions between small, particle-like neurons.

The Process of Learning:

Information Input: When a new piece of information (a “token”) comes in, only the neurons interested in it light up.
Signal Propagation: An activated neuron passes the signal to its immediate neighbors. Not everyone in the network lights up, only those who care about that specific piece of information. This is the principle: neurons that fire together, wire together.
Strengthening Connections: The more two neurons are interested in the same thing, the stronger the connection between them becomes. This is memory.
Fading Connections: Conversely, connections that are not used over time will begin to fade.

This structure emerges naturally from the data. We don’t pre-define it.

The Emergence of a Digital Brain

One evening in the lab, we witnessed it. We saw the emergence of this brain-like structure appearing on its own. It was a moment of pure awe, a spontaneous order arising from what seemed like randomness. For a complexity scientist, emergence is the holy grail. You strip a system down to its simplest rules and watch a larger order appear. We had created a structure that learned on its own.

This structure is incredibly efficient. Because it’s like a brain, it’s computationally efficient and distributes nicely across many machines. It’s a scale-free graph, which means that even if we go beyond the scales we’ve tested, we scientifically know how it will behave. This is very different from transformers, where such predictability hasn’t been established.

The Promise of Interpretable AI

This architecture is also more interpretable. With transformers, researchers are trying to build the equivalent of MRI machines to scan the model’s “brain” and understand what’s happening. With BDH, we have a CCTV camera inside the brain.

We can see the neural activity precisely. We see which neurons and synapses fire up for a given concept, like “currency.” We even see neurons getting bored. If you keep repeating the same information, their activity just goes down. This element of surprise shows that something is valuable and worth remembering. Seeing this surprise effect in the neural activity was a fascinating discovery.

Beyond Brute Force: The Power of Efficient Scaling

So, what’s the path to scaling this? We’ve proven that BDH works at the GPT-2 scale with one billion parameters. While the scaling laws are inherited from transformers, our game is not about scaling with more parameters and more data. The value comes from faster learning—from solving problems that haven’t been seen in the training data.

We hope to soon see very small models capable of producing results comparable to the big ones. We’re not looking at brute-force scaling; we’re looking at getting better at puzzle-solving and reasoning.

The reasoning power doesn’t come from size alone. The human brain has trillions of synaptic connections, providing immense memory in an efficient structure. BDH operates on a similar principle. The context is limited only by the size of the “brain” (the number of neurons), but the network structure allows it to encode a vast amount of information. Memory is kept close to the core, eliminating the need for slow lookups and extra compute. It’s like having memory directly on the chip.

Furthermore, you don’t fire up the entire model every time—only the neurons that care. Why use the full brain for a simple task?

A cool thing we can do, which our own brains cannot, is glue two separately trained models together. We demonstrated in our paper that we can take a model trained in one language and another trained in a different language, put them together, and they become one. They start producing sentences that mix the two languages. With a bit more training, they form a single, unified entity. It’s like Lego blocks. You could imagine combining a model trained on finance with one trained on legal to create a super-powered expert.

From Theory to Reality: The Road Ahead

We already have a history of working with amazing organizations like NATO, the French Postal Service, and Formula 1. For now, they have the “dragon’s nest”—the underlying technology that allows us to feed live data efficiently and create a cozy environment for the dragon to live in. These use cases are incredible, from optimizing race strategy in Formula 1 to informing real-time decisions in complex, dynamic environments.

Just recently, we announced a partnership with NVIDIA and AWS. The moment our model is ready, it will be available to AWS customers, easy to plug in, test, and adopt. This should happen sometime next year. Our own roadmap is simple: we believe we’re on a faster path to AGI, and we are working to get there as quickly as possible.

The Philosophical Leap: Reasoning, Safety, and the Path to AGI

We see reasoning as the primary function of intelligence. LLMs are great for chatbots, summarization, and search, but reasoning is what will allow us to solve and invent solutions to very tough problems. Our north star is to create an innovator who sees what’s not there, as opposed to just recomposing what was.

In terms of safety, our approach provides a better scientific understanding of how and why the models work. Mapping the passage from micro-interactions to a scale-free structure governed by known laws is crucial for safety. We are also exploring provable risk levels to ensure the system behaves predictably.

If you hire someone, you assume they will perform their tasks without, for example, blowing up the planet. We should have the same fair assumption with AI. At least mathematically, we should have comfort that we know how these models function.

What about preventing it from learning something undesirable? It’s actually not too difficult. You can simply roll back to a previous checkpoint. The observability from our “CCTV” allows us to see information spreading like an epidemic through the graph. If a small, unwanted piece of information is introduced, you can reverse it or quarantine it. For larger changes, you can always revert the entire model to a trusted state.

The Ultimate Goal: True Generalization

What is the one capability we’re most excited to unlock? True generalization. Getting to an innovator-level AI. This opens up a path toward grand challenges like space exploration. Not by putting our models in space (though that is happening), but by using AI as a crucial tool to solve fundamental problems in science and technology, like energy, that are currently blocking our progress.

This transition can be compared to the moment humanity discovered agriculture. We stopped wandering and settled down, which allowed us to build culture and civilization. AI, once it becomes this powerful and ubiquitous, will allow us to build Civilization 2.0.

The pace of change is unprecedented. A year ago, the concept of “reasoning models” was obscure. Today, it’s at the forefront of the conversation. The landscape shifts from month to month. We spend a lot of time thinking about the next five or ten years, but what about the next hundred? Or thousand? The work being done right now is laying the foundation for where humanity will be far into the future.