The Python GIL

Python has one peculiarity that makes concurrent programming harder. It’s called the Python GIL, short for Global Interpreter Lock. The GIL makes sure there is, at any time, only one thread running. Because only one thread can run at a time, it’s impossible to use multiple processors with threads. But don’t worry, there’s a way around this, using the multiprocessing library.

The GIL was invented because CPython’s memory management is not thread-safe. With only one thread running at a time, CPython can rest assured there will never be race conditions.

Thread-safety

As mentioned already, threads share the same memory. With multiple threads running simultaneously, we don’t know the order in which the threads access shared data. Therefore, the result of accessing shared data is dependent on the scheduling algorithm. This algorithm decides which thread runs when. Threads are “racing” to access/change the data.

Thread safety
Thread-safe code only manipulates shared data in such a way, that it does not interfere with other threads.

A demonstration of a race condition

As an example, let’s create a shared variable a, with a value of 2:

a = 2

Now suppose we have two threads, thread_one and thread_two. They perform the following operations:

  • thread_one: a = a + 2
  • thread_two: a = a * 3

If thread_one is able to access a first and thread_two second, the result will be:

  • a = 2 + 2, a is now 4.
  • a = 4 * 3, a is now 12.

However, if it so happens that thread_two runs first, and then thread_one, we get a different output:

  • a = 2 * 3, a is now 6
  • a = 6 + 2, a is now 8

So the order of execution obviously matters for the output. There’s an even worse possible outcome, though! What if both threads read variable a at the same time, do their thing, and then assign the new value? They will both see that a = 2. Depending on who writes its result first, a will eventually be 4 or 6. Not what we expected! This is what we call a race condition.

Race condition
The condition of a system where the system’s behavior is dependent on the sequence or timing of other, uncontrollable events. 

Race conditions are difficult to spot, especially for software engineers that are unfamiliar with these issues. Also, they tend to occur randomly, causing erratic and unpredictable behavior. These bugs are notoriously difficult to find and debug. It’s exactly why Python has a GIL — to make life easier for the majority of Python users.

Getting rid of the GIL

If the GIL holds us back in terms of concurrency, shouldn’t we get rid of it or be able to turn it off? It’s not that easy. Other features, libraries, and packages have come to rely on the GIL, so something must replace it, or else the entire ecosystem will break. This turns out to be a difficult problem to solve. If it interests you, you can read more about this on the Python wiki.

Share this