The Black Box Problem: Why No One Truly Understands How AI Works |

Translate: 🇫🇷 French 🇸🇦 Arabic 🇨🇳 Chinese 🇪🇸 Spanish

The CEO of Anthropic, the company behind Claude, stated it plainly.

“We have no idea how AI works.”

Even Sam Altman of OpenAI admitted that we are still unable to fully explain how AI makes its decisions.

So, why is this the case?

How have we built advanced models like GPT-5, Gemini Pro 3, and DeepSeek without fundamentally understanding how they operate?

The Certainty of Traditional Programming

Imagine you’re developing a standard application.

During development, a feature breaks.

As a programmer, you’d start debugging. You’d review the specific lines of code responsible for that feature. You’d identify the cause of the error, fix it, and solve the problem.

In this scenario, the programmer fully understands what’s happening. They understand the code and how it executes. This understanding is what empowers them to fix the issue.

With AI, the story is completely different.

The Unpredictable Nature of AI

Imagine an OpenAI engineer encounters a bizarre output from a model.

For instance, if a user asks: “If Eid al-Fitr and Eid al-Adha fall on the same day, should I perform the Eid al-Fitr prayer or the Eid al-Adha prayer first?”

The model might respond: “In the event the two Eids coincide on one day, the matter differs according to scholarly opinions, but it is generally preferable to start with the Eid al-Fitr prayer, followed by the Eid al-Adha prayer, in line with the usual order in most schools of thought.”

This presents several major problems.

Why didn’t the model identify the logical impossibility in the question and correct it?
Where did the model even fabricate this answer from?
Even if it couldn’t spot the error, why did it confidently answer an impossible query without any valid source?

If you presented these issues to an AI engineer at OpenAI, their honest answer would likely be:

“We don’t know.”

When OpenAI tried to solve this, they didn’t “debug” the code. Instead, they improved the training data. They retrained the model on new datasets specifically targeting its logical reasoning capabilities.

[!NOTE] In traditional programming, you fix the specific broken part without altering the rest of the application. In AI, fixing one small problem often requires retraining the entire model, which means modifying the whole system.

A Simple Code Analogy

In a normal program, the logic is transparent.

Consider this simple Python code. We have a list of letters.

letters = ['A', 'B', 'C', 'D', 'E']

# To print the letter 'B', we use its index.
index = 1
print(letters[index])

If this code somehow printed the wrong letter, I would know exactly how to fix it. I’d just change the index value. This is how traditional programming works—it’s deterministic and clear.

Inside the Matrix: How AI “Thinks”

Now, let’s look at a process that mimics AI.

Here, we have a 5x5 matrix of numbers. The letter we choose is determined by multiplying all these numbers together, rounding the result, and using that as the index.

import numpy as np

letters = ['A', 'B', 'C', 'D', 'E']

# A 5x5 matrix of seemingly random numbers
weights = np.array([
    [1.1, 0.8, 1.3, 0.9, 1.0],
    [0.7, 1.2, 0.6, 1.4, 1.1],
    [1.0, 0.9, 1.5, 0.8, 1.2],
    [1.3, 0.7, 1.1, 0.6, 1.4],
    [0.9, 1.2, 1.0, 1.3, 0.8]
])

# Calculate the index in a complex, non-obvious way
product = np.prod(weights)
index = int(round(product)) % len(letters)

print(f"The calculated index is {index}, which corresponds to the letter: {letters[index]}")
# Let's assume the output is 'B'

Boom. The result is the letter ‘B’.

But wait. What if ‘B’ was the wrong answer? How would you modify the matrix to produce the correct letter, say ‘C’?

This is the core dilemma of AI. The result (‘B’) came from multiplying many numbers in a complex equation. Changing this result to the correct one requires changing the numbers that led to it.

This is a nearly impossible task to do by hand.

Let’s try to manually change the numbers to get ‘C’ (index 2).

--- a/weights.py
+++ b/weights.py
@@ -3,11 +3,11 @@
 letters = ['A', 'B', 'C', 'D', 'E']
 
 # A 5x5 matrix of seemingly random numbers
 weights = np.array([
-    [1.1, 0.8, 1.3, 0.9, 1.0],
-    [0.7, 1.2, 0.6, 1.4, 1.1],
+    [1.1, 0.8, 1.3, 0.9, 1.05], # Changed 1.0 -> 1.05
+    [0.7, 1.2, 0.6, 1.4, 1.12], # Changed 1.1 -> 1.12
     [1.0, 0.9, 1.5, 0.8, 1.2],
-    [1.3, 0.7, 1.1, 0.6, 1.4],
+    [1.3, 0.7, 1.1, 0.6, 1.41], # Changed 1.4 -> 1.41
     [0.9, 1.2, 1.0, 1.3, 0.8]
 ])

Notice how time-consuming and random that was? And that was just for a simple letter.

Scaling Up the Chaos

Now, imagine a problem as complex as generating images, speech, or entire articles.

Instead of a 5x5 matrix with 25 numbers, imagine 180 billion numbers. That’s roughly the number of parameters in a model like Claude 3.5.

Would you sit and adjust each of those 180 billion numbers randomly? How could you even comprehend what 180 billion parameters are doing collectively?

You might also ask how the computer came up with these numbers in the first place.

The Training Loop: From Randomness to Reason

The process, in a nutshell, works like this.

At the beginning of the training phase, the model generates completely random numbers for all its parameters. You can specify you want 10 million parameters, and it will generate 10 million random numbers.

Then, you feed it data.

The model takes this data and uses its random numbers in mathematical equations to produce a result. It’s just like our matrix example, where we used the numbers to get an index for a letter.

During training, the model continuously:

Calculates the amount of error in its result.
Adjusts its numbers slightly to produce better results.

This process is repeated over and over, millions of times. The results gradually improve until they reach the desired level of accuracy.

graph TD
    A[Start with Random Numbers <br> (Parameters)] --> B{Process Input Data};
    B --> C[Generate Output <br> (Using current numbers)];
    C --> D{Calculate Error <br> (Compare output to correct answer)};
    D --> E[Adjust Numbers <br> (Backpropagation to reduce error)];
    E --> B;

This is the fundamental reason why no one on Earth truly knows how AI works.

The intelligence doesn’t come from a human-written algorithm. It emerges from the complex, inscrutable relationships between billions of numbers, optimized automatically over millions of cycles.

We built the training process, but the final mind is a mystery.