Beyond Prediction: The Dawn of AI That Truly Thinks |

Translate: 🇫🇷 French 🇸🇦 Arabic 🇨🇳 Chinese 🇪🇸 Spanish

There’s a test that makes every AI look stupid. Chat GPT gets 0%. Claude gets 0%. Even Google’s best AI gets a mere 8%. But here’s what nobody’s telling you. A new type of AI just appeared that doesn’t predict words anymore. It thinks. The scary part? It’s already better than most humans at things we thought were impossible.

For the past few years, we’ve been living in the era of Large Language Models (LLMs). Chat, GPT, Claude, Gemini—these things have been everywhere. They are super-smart autocomplete on steroids. You type something, and they predict what should come next based on patterns they’ve seen in mountains of training data. And honestly, they’re pretty incredible.

The Dirty Secret of Modern AI

But these models have a dirty little secret. They don’t think. They pattern match. They’re like that student who memorized the textbook but can’t solve a single problem they haven’t seen before. And while we’ve been distracted by their impressive mimicry, researchers have been quietly working on something that could change everything.

Let me paint you a picture. Imagine you’re reading a novel.

On page 50, there was a box.
On page 100, a cat sits on the box.
On page 150, the box breaks.

Now, if I ask you what happened to the box, you can track that story. You understand state changes over time. Traditional LLMs struggle with this badly.

The way current AI works is through something called an attention mechanism. When you’re reading a sentence, you need to know which words are important and how they relate. “The cat sat on the box” means something totally different from “the box was on the cat,” right? But the current gold standard, a method called rotary position encoding, treats word positions like fixed distances. It’s like saying words that are four positions apart always get the same treatment, regardless of what those words are. It’s aware of position, but it’s not aware of content.

This creates real problems. According to MIT research, LLMs can’t follow variables changing in code. They mess up conditional instructions. They’re flying blind when things get complex.

Note: For example, an LLM might fail to track the value of a variable through a simple loop.

# A simple state-tracking problem for an AI
def track_value(limit):
    value = 10
    for i in range(limit):
        if i % 2 == 0:
            value += 5  # Increase on even numbers
        else:
            value -= 2  # Decrease on odd numbers
    # What is the final 'value'?
    return value

# LLMs often struggle to compute the correct final state.

Enter the New Kids on the Block: Large Reasoning Models

Enter the new kids on the block: Large Reasoning Models, or LRMs. Trust me, these things are different.

Instead of just spitting out an answer instantly, these models take time to think. I’m not being poetic here. They literally use what’s called “test-time compute.” They spend more processing power during the actual problem-solving phase, not just during training.

Think of it this way. Traditional LLMs are like speed chess players. They’ve seen millions of games, recognize patterns, and move fast. But LRMs are like chess grandmasters who calculate 20 moves ahead, consider different strategies, and adjust their thinking as they go.

The secret sauce is something called chain-of-thought reasoning. These models generate long streams of internal logic, sometimes thousands of words, before giving you an answer. They can backtrack when they realize they made a mistake. They can try multiple approaches and vote on the best one. They can literally catch themselves in loops and break out.

OpenAI’s SO3 model can now score at the top of elite science benchmarks.
Google’s Gemini 2.5 can process a million tokens at once. That’s like reading several novels simultaneously and keeping track of every single plot thread.
A Chinese open-source model called Deep Seek R1 scored 99.2% on elite math tests that would stump most humans.

But there’s more. MIT researchers just developed something called path attention that makes positional information adaptive. Instead of treating word positions as fixed distances, it treats the path between words like a journey where each step influences the next. It’s the difference between measuring distance on a map versus walking the route and experiencing the terrain. We’re moving from AI that mimics intelligence to AI that might possess it.

The High Cost of Thinking

Let’s talk about the money. Because let’s be honest, that’s what really drives change. These new AI reasoning models are expensive. And I don’t mean a little expensive. We’re talking anywhere from a few cents to several dollars per single task.

When they tested one of OpenAI’s top models on a benchmark for general intelligence, it cost about $200 per task. That’s not a typo. $200 to get one answer.

But here’s why people are even considering it: the accuracy goes through the roof. You can hire someone for $10 who gets it wrong half the time, or you can pay $50 for someone who nails it 95% of the time. When the stakes are high, like a medical diagnosis or a complex legal case, that accuracy is everything. And things are getting more efficient fast. Some models are hitting a sweet spot, giving you decent accuracy for just a few cents. Plus, big companies could save tens of billions a year just by switching to smarter open-source models instead of always using the priciest option.

A Dose of Reality

Okay, so before you run off thinking we’ve cracked the code on AI, let me hit you with a reality check.

They still fail, and they fail hard. On a new benchmark that tests for genuine reasoning ability, the best models are scoring around 8% accuracy. Eight percent. Humans get it right almost every time. As one researcher put it, “we still need new ideas for AGI.”
They are slow. So slow that some tasks just time out. You can’t just plug one of these into your app and call it a day. You have to redesign the entire product around its pace. It’s gotten so confusing for users that OpenAI is planning to merge these specialized systems back into their main models.
We’ve hit a wall. For years, the strategy was simple: just make the AI bigger. More data, more power. But at the big NeurIPS conference, the experts are all saying the same thing. That’s not working anymore. The hype is cooling down. We need a fundamentally new approach.

The Future is a Hybrid

So where does that leave us? This is where it gets exciting. The most promising work is happening where two fields are colliding: deep learning and program synthesis.

Think of it like this. Deep learning is the gut instinct, the pattern recognition. Program synthesis is the cold, hard logic. You combine the two, and you might get something that actually thinks.

We’re also seeing some truly wild ideas, like hybrid quantum-classical models. Researchers just built an AI that uses actual quantum circuits, swapping out 10% of its brain for 10 quantum bits with zero loss in quality. IBM is already seeing a 34% jump in accuracy on some tasks with their quantum processors.

And then there’s the hardware race. China built a server the size of a mini-fridge that uses 90% less power while processing half a million tokens a second. New chips are coming out that are one and a half times faster than the best on the market, using a quarter of the power.

From Prediction to Understanding

Here’s what’s really happening underneath all this. We are moving from an era of AI that predicts to an era of AI that understands. Your typical AI today is just predicting the next word in a sentence. These new systems are trying to predict the next concept, the next logical step. That is a completely different game.

So, are these new AIs better than what we have now? Yes and no. For the really tough stuff—math, science, deep analysis—they are already in a different league. But for quick, cheap, everyday questions, the old models are still king.

We’re not replacing one with the other. We’re adding a powerful, specialized new tool to the toolbox. The real question isn’t which AI is better. It’s what becomes possible now that we have both. And I think we are just about to find out.