With the Python multiprocessing library, we can write truly parallel software. When we were using Python threads, we weren’t utilizing multiple CPUs or CPU cores. However, with this library, we will. To start, we need to import the multiprocessing
module instead of threading
.
Multiprocessing based example
Here’s the multiprocessing version of the code we used in the baseline version and the threaded version:
import time import multiprocessing # A CPU heavy calculation, just # as an example. This can be # anything you like def heavy(n, myid): for x in range(1, n): for y in range(1, n): x**y print(myid, "is done") def multiproc(n): processes = [] for i in range(n): p = multiprocessing.Process(target=heavy, args=(500,i,)) processes.append(p) p.start() for p in processes: p.join() if __name__ == "__main__": start = time.time() multiproc(80) end = time.time() print("Took: ", end - start)
This code takes about 23 seconds to run, which is half of the threaded version!
As you can see, this looks almost the same as the threaded version, code-wise. The threading and multiprocessing modules are intentionally made very equivalent. But the 80 invocations of heavy finish roughly twice as fast this time.
My test system (a small desktop computer) has only two CPU cores, so that explains why it’s a factor two. If I run this code on my brand new laptop, with 4 faster CPU cores, it’s more than four times faster. This perfectly demonstrates the linear speed increase multiprocessing offers us in the case of CPU-bound code.
Python multiprocessing pool
We can make the multiprocessing version a little more elegant and slightly faster by using multiprocessing.Pool(p)
. This Python multiprocessing helper creates a pool of size p
processes. If you don’t supply a value for p
, it will default to the number of CPU cores in your system, which is actually a sensible choice most of the time.
By using the Pool.map()
method, we can submit work to the pool. This work comes in the form of a simple function call:
import time import multiprocessing # A CPU heavy calculation, just # as an example. This can be # anything you like def heavy(n, myid): for x in range(1, n): for y in range(1, n): x**y print(myid, "is done") def doit(n): heavy(500, n) def pooled(n): # By default, our pool will have # numproc slots with multiprocessing.Pool() as pool: pool.map(doit, range(n)) if __name__ == "__main__": start = time.time() pooled(80) end = time.time() print("Took: ", end - start)
This version’s runtime is roughly the same as the non-pooled version, but it has to create fewer processes, so it is more efficient. After all, instead of creating 80 processes, we created four and reused those each time.
Keep learning
If you want to learn more about Python multiprocessing and working with multiprocessing pools, you can head over to the official documentation. If you work on the command line a lot, you might be interested in my article on Bash multiprocessing as well!