Ask a developer why Python is slow and there's a good chance they'll say "the GIL." Ask them what the GIL actually is, and the answer usually gets vague. It's one of those concepts that everyone has heard of, most have strong opinions about, and surprisingly few can explain precisely.

That's worth fixing — because once you understand what the GIL really does, you'll know when it matters (rarely), when it doesn't (most of the time), and what Python 3.13 actually changed about it.

Start here: CPython is not Python

The GIL is not a feature of the Python language. It's a feature of CPython — the reference implementation, the one you get when you run python3 on most machines. PyPy, Jython, and other implementations have their own approaches. This distinction matters because the GIL is not inevitable. It's an engineering decision made in a specific context, with specific tradeoffs.

CPython manages memory using reference counting. Every Python object carries a count of how many references point to it. When that count hits zero, the memory is freed. Simple, fast, and — crucially — not thread-safe.

What the GIL actually does

Reference counting creates a problem with threads. If two threads simultaneously modify the reference count of the same object, you get a race condition. The count gets corrupted. Memory that's still in use gets freed. Your program crashes or, worse, silently produces wrong results.

The GIL is the solution: a single mutex lock around the interpreter itself. Only one thread can hold the GIL at a time, which means only one thread can execute Python bytecode at a time. Race conditions on reference counts become impossible because the counts are never modified concurrently.

The GIL doesn't make Python thread-safe. It makes CPython's memory management thread-safe. There's a meaningful difference.

Your Python code can still have race conditions — the GIL doesn't protect your data structures, only the interpreter's internal state. A += on a shared integer is still not atomic. The GIL just ensures the interpreter doesn't corrupt itself underneath you.

The check interval and cooperative scheduling

If only one thread runs at a time, how does Python handle multiple threads at all? Through the check interval. Every N bytecode instructions (every 5 milliseconds by default since Python 3.2), CPython forces the current thread to release the GIL. Other threads get a chance to acquire it and run.

It's cooperative scheduling, automated. You can observe this directly:

import threading
import time

counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start(); t2.start()
t1.join();  t2.join()

print(counter)  # Almost certainly not 2,000,000

The counter is not protected by any lock. The GIL doesn't help here — counter += 1 compiles to multiple bytecode instructions (LOAD, ADD, STORE), and a thread switch can happen between any two of them. This is a real race condition, and you need an explicit lock or threading.Lock() to prevent it.

Why CPU-bound threads don't scale

Here's where the GIL causes genuine pain. Suppose you have a CPU-bound task — number crunching, parsing, hashing — and you want to parallelize it across four threads on a four-core machine. You'd expect roughly 4× speedup. What you actually get is roughly the same speed as a single thread, or sometimes slower.

import threading
import time

def crunch(n):
    # Pure CPU work: count down from n
    while n > 0:
        n -= 1

N = 50_000_000

# Sequential
start = time.perf_counter()
crunch(N); crunch(N)
print(f"Sequential: {time.perf_counter() - start:.2f}s")

# Threaded (two threads, same total work)
start = time.perf_counter()
t1 = threading.Thread(target=crunch, args=(N,))
t2 = threading.Thread(target=crunch, args=(N,))
t1.start(); t2.start()
t1.join();  t2.join()
print(f"Threaded:   {time.perf_counter() - start:.2f}s")

On a typical machine, the threaded version will be slower. The two threads spend time fighting over the GIL — acquiring it, releasing it, context-switching — while producing no additional parallelism. You've added overhead without adding throughput.

The exception: I/O-bound work

The GIL is only held while executing Python bytecode. When a thread makes a system call — reading a file, waiting on a socket, sleeping — it releases the GIL voluntarily. Other threads run while it waits.

This is why threads are genuinely useful for I/O-bound work. A web scraper fetching 100 URLs concurrently with threads works well: while one thread waits for a response, others are running. The GIL is barely a factor.

import threading
import urllib.request

urls = [
    "https://example.com",
    "https://example.org",
    "https://python.org",
]

def fetch(url):
    with urllib.request.urlopen(url) as r:
        print(f"{url}: {r.status}")

threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()

For CPU-bound parallelism, the right tool is multiprocessing. Each process gets its own interpreter and its own GIL — no contention. Or use a C extension like NumPy, which releases the GIL during its heavy computation, letting Python threads genuinely run in parallel around it.

What Python 3.13 changed

Python 3.13 shipped an experimental build mode: --disable-gil. This is the result of PEP 703, the "Making the GIL Optional" proposal by Sam Gross, which spent years under evaluation before being accepted.

The free-threaded build replaces reference counting's reliance on the GIL with a combination of techniques: biased reference counting (each object is "owned" by a thread, reducing contention), immortal objects for common values like small integers and None, and deferred reference counting for certain cases.

The catch — and it's a real one — is that the free-threaded build is still experimental. Many C extensions assume the GIL exists and use it implicitly for their own thread safety. Running them in a free-threaded interpreter can cause crashes. The ecosystem needs time to adapt.

Removing the GIL doesn't make Python thread-safe. It removes one layer of implicit protection and requires the rest of the ecosystem to fill the gap.

The practical summary

The GIL matters if you are writing CPU-bound code that you want to parallelize using threads. In that narrow case, it's a genuine constraint, and multiprocessing or a free-threaded Python 3.13+ build is the right path.

In most application code — web services, data pipelines, scripts, CLI tools — you are I/O-bound, and the GIL is simply not the bottleneck. The slowness you're experiencing is almost certainly somewhere else: a database query, a network call, an inefficient algorithm.

Profile before you blame the GIL. It's a convenient scapegoat, and a rarely correct one.


Filed under

Python Concurrency In Depth
Browse all posts