Python has one peculiarity that makes concurrent programming harder. It’s called the Python GIL, short for Global Interpreter Lock. The GIL makes sure there is, at any time, only one thread running. Because only one thread can run at a time, it’s impossible to use multiple processors with threads. But don’t worry, there’s a way around this, using the multiprocessing library.
Table of Contents
Thread-safety
As mentioned already, Python threads share the same memory. With multiple threads running simultaneously, we don’t know the order in which the threads access shared data. Therefore, the result of accessing shared data is dependent on the scheduling algorithm. This algorithm decides which thread runs when. Threads are “racing” to access/change the data.
- Thread safety
- Thread-safe code only manipulates shared data in such a way, that it does not interfere with other threads.
The GIL was invented because CPython’s memory management is not thread-safe. With only one thread running at a time, CPython can rest assured there will never be race conditions.
A demonstration of a race condition
As an example, let’s create a shared Python variable called a
, with a value of 2:
a = 2
Now suppose we have two threads, thread_one and thread_two. They perform the following operations:
- thread_one:
a = a + 2
- thread_two:
a = a * 3
If thread_one is able to access a
first and thread_two second, the result will be:
- a = 2 + 2,
a
is now 4. - a = 4 * 3,
a
is now 12.
However, if it so happens that thread_two runs first, and then thread_one, we get a different output:
- a = 2 * 3,
a
is now 6 - a = 6 + 2,
a
is now 8
So the order of execution obviously matters for the output. There’s an even worse possible outcome, though! What if both threads read variable a
at the same time, do their thing, and then assign the new value? They will both see that a = 2. Depending on who writes its result first, a will eventually be 4 or 6. Not what we expected! This is what we call a race condition.
- Race condition
- The condition of a system where the system’s behavior is dependent on the sequence or timing of other, uncontrollable events.
Race conditions are difficult to spot, especially for software engineers that are unfamiliar with these issues. Also, they tend to occur randomly, causing erratic and unpredictable behavior. These bugs are notoriously difficult to find and debug. It’s exactly why Python has a GIL — to make life easier for the majority of Python users.
Can we get rid of the Python GIL?
If the GIL holds us back in terms of concurrency, shouldn’t we get rid of it or be able to turn it off? It’s not that easy. Other features, libraries, and packages have come to rely on the GIL, so something must replace it, or else the entire ecosystem will break. This turns out to be a difficult problem to solve. If it interests you, you can read more about this on the Python wiki.
Latest Python GIL developments in 2021
Recently, someone revived the discussion by offering a promising proof-of-concept CPython version with the GIL removed. The source code for this proof-of-concept can be found on Github here. The author has included a comprehensive document explaining the details and peculiarities of this operation. It’s an interesting read for those wanting to learn more about this subject.