Today, we’re going to cover some low-level concepts that you probably never have to think about unless you’re working at the systems level. One of the most frequent questions developers have is why some projects seem to involve multiple programming languages in their development. Explaining this can be either extremely easy or extremely difficult, depending on the type of project.
Take a full-stack framework like Django, for example. Python is used to handle the backend which runs on the server, while HTML, CSS, and JavaScript are used to build the user interface displayed on the client side. This is a multi-language project. But in this case, it’s easy to understand how everything works in production because we’re essentially developing two separate processes that communicate remotely at runtime using some form of interprocess communication.
But what about other types of projects? There are many where components written in different programming languages are meant to run together as a single process. So how are these kinds of projects even possible?
The Compiler: More Than a Black Box
For simplicity, let’s start by considering only programming languages that compile down to machine code. Generally, each programming language has its own dedicated compiler. You can’t just take a Rust file and compile it using the Go compiler. This is where things start to get interesting and a bit confusing. If most programming languages have separate compilers, runtimes, and memory models, how can they possibly live inside the same binary?
What often confuses people is the common oversimplification that compilers are just tools that turn source code directly into executable files. Now, don’t get me wrong, compilers do produce executable files, but that’s only the final result of a much more complex, multi-step process that we usually don’t see.
To illustrate this, let’s look at a simple C program. This program simply prints a message, but the message it prints changes depending on the operating system you’re running it on.
#include <stdio.h>
int main() {
#if defined(__linux__)
printf("Hello from Linux!\n");
#elif defined(_WIN32)
printf("Hello from Windows!\n");
#else
printf("Hello from an unknown OS!\n");
#endif
return 0;
}
On most GNU/Linux systems, the go-to compiler for C is GCC. To compile and run our C program, we usually just call GCC and pass it the file we want to compile. Then an executable is generated. From our perspective, it’s just two simple steps: one to compile the program and another one to run it. But under the hood, the compiler is doing a whole lot more.
The Four Phases of Compilation
Internally, GCC goes through four main phases to turn a C file into a working executable.
-
Pre-processing: This step prepares the source code. It removes comments, expands macros, and crucially, resolves
#includedirectives. The C pre-processor replaces an include line with the entire contents of the specified header file, effectively inserting that code into our file before compilation begins. The output is still C code, just pre-processed for the next step. -
Compilation: Next comes the actual compilation, but not directly into machine code. Instead, the pre-processed code is translated into assembly language. This is our first myth busted. A compiler doesn’t always convert source code directly into machine code. In fact, many compilers convert source code into an intermediate representation like assembly or even into another high-level programming language.
-
Assembly: The third step involves the assembler. This tool, which is technically another compiler, takes the human-readable assembly code and translates it into machine code—the ones and zeros your CPU understands. The result is called an object file. But this object file isn’t runnable yet.
-
Linking: This brings us to the final step. At this stage, we may have multiple object files—some from our code, others from external libraries. The linker’s job is to combine all these object files into a single, self-contained executable.
Static vs. Dynamic Linking
The linker has two primary ways to combine object files.
Static Linking is the easiest approach. The linker takes the machine code of each required function from a library and copies it directly into the final executable. All the library functions our program needs are embedded directly into the output file, making it completely self-contained and ready to run.
Dynamic Linking offers a more efficient alternative. Think about how many programs on your system use the printf function from the standard C library. If every one of those programs statically included its own copy, you’d have thousands of identical copies stored on your disk. With dynamic linking, libraries are pre-compiled into a special type of file called a dynamic shared library (.so files on Unix-like systems, .dll on Windows).
When our program is compiled with dynamic linking, the linker doesn’t copy the functions. Instead, it inserts a reference to the shared library. At runtime, when the program needs a function from that dynamic library, the operating system loads it into the program’s address space. This saves both disk space and memory, as the system only needs to store the library once. It’s also more flexible, since you can update a library without having to recompile every program that uses it.
A Pluggable Pipeline
You might be wondering, why break the process into so many phases? The reason we don’t normally see these intermediate steps is because compilers like GCC are configured by default to hide them. But with the right flags, we can expose them.
For example, using gcc -save-temps will output all the intermediate files alongside the final executable. We can even stop the process at a specific stage. The -S flag, for instance, makes the process stop after generating the assembly file. This is incredibly useful for seeing how high-level C code translates to low-level assembly.
Even more interesting, we can start from any phase in the pipeline. We can pass GCC an assembly file and tell it to simply assemble and link it. This is huge. It means we can write part of our code in assembly, pass it to the compiler, and let the linker mix it with our C code into a single executable.
This already starts to answer our original question. Suppose we need to write a program that calculates prime numbers and we want it to be as fast as possible. We could write the heavy calculation function directly in assembly for maximum performance and just call it from C. Then, we pass both the C and assembly files to GCC. The compiler will handle compiling the C code, assembling the assembly code, and linking both object files together. Voila, we’ve just compiled a multi-language project. This technique is used by real-world systems like the Linux kernel, FFmpeg, and OpenSSL.
This reveals a key insight: what we casually call the “C compiler” is actually a toolchain—a pipeline of pluggable tools. This is why GCC doesn’t just support C; it also supports C++, Objective-C, Fortran, and even Go. The name GCC originally stood for GNU C Compiler, but it was later redefined to GNU Compiler Collection to reflect its evolution into a multi-language compiler suite.
Mixing High-Level Languages
So what about mixing high-level languages? For example, what if we implement part of our project in Fortran and part in C? This is entirely possible. In this case, we usually need multiple steps: one to compile the Fortran file, another to compile the C file, and a third to link both object files into a single executable.
The answer to how different languages can live inside a single executable comes down to the linker.
The languages don’t even need to come from the same compiler suite. Take Rust, for example. It has a completely different toolchain from C. But when it comes time to produce the final binary, Rust also relies on a linker. So, if we want to call a Rust function from C, the process is straightforward:
- Implement the function in Rust.
- Compile the Rust code into a static or dynamic library.
- Declare and use the function in the C code.
- Compile the C code and link it with the Rust-compiled library.
This works the other way, too. It’s common to call C code from Rust, especially since many mature libraries and system APIs (for graphics, cryptography, etc.) are written in C.
The Application Binary Interface (ABI)
However, there’s one more crucial point. Just because two languages have a final linking phase doesn’t automatically mean they can be correctly linked. They must agree on a set of low-level rules defined by the Application Binary Interface (ABI).
An ABI defines how different components of binary code interact with each other through the hardware. Let’s consider two scenarios where an ABI mismatch can cause problems.
Scenario 1: Different Calling Conventions Imagine Language A calls a function written in Language B. The compiler for Language A might pass function parameters in registers 0 and 1. However, the compiler for Language B might expect those same parameters in registers 1 and 2. Even if both produce valid machine code, their calling conventions differ. At runtime, Language A will place arguments in the wrong place, and Language B will perform operations using incorrect data, leading to undefined behavior.
Scenario 2: Pass-by-Reference vs. Pass-by-Value In another example, Language X and Language Y both use the same registers. But Language X uses pass-by-reference, meaning it puts the memory addresses of variables into the registers. Meanwhile, Language Y uses pass-by-value, so it expects the actual values in the registers. This mismatch would cause Language Y to interpret memory addresses as values, leading to completely wrong calculations or even a crash.
For two languages to interoperate, at least one of them must conform to the other’s ABI expectations.
The good news is that modern languages provide tools to manage this. Keywords and compiler flags tell the compiler that a function will interact with code from another language, ensuring the generated assembly follows the expected ABI.
- In C, you might use
extern. - In Rust, you’d use the
externkeyword and the#[no_mangle]attribute. - In Fortran, you can use the
bind(C)attribute. - In Go, a special
import "C"statement allows you to include C header files and even write inline C code.
Every language has its own way of doing this, but the goal is the same: to ensure seamless communication between different languages at the binary level.