Concurrency is working on multiple things at the same time. In Python, this can be done in several ways, which we will explore in this chapter.
Make sure concurrency is what you want
Before you consider concurrency, which can be quite tricky, always take a good look at your code and algorithms first. Many speed and performance issues can be resolved by implementing a better algorithm or adding caching. Entire books are written about this subject, but some general guidelines to follow are:
- Measure, don’t guess. Measure which parts of your code take the most time to run. Focus on those parts first.
- Implement caching. This can be a big optimization if you perform many repeated lookups from disk, the network, and databases.
- Reuse objects instead of creating a new one on each iteration. Python has to clean up every object you created to free memory. This is called garbage collection. The garbage collection of many unused objects can slow down your software considerably.
- Reduce the number of iterations in your code if possible, and reduce the number of operations inside iterations.
- Avoid (deep) recursion. It requires a lot of memory and housekeeping for the Python interpreter. Use things like generators and iteration instead.
- Reduce memory usage. In general, try to reduce the usage of memory. For example: parse a huge file line by line instead of loading it in memory first.
- Don’t do it. Do you really need to perform that operation? Can it be done later? Or can it be done once, and can the result of it be stored instead of calculated over and over again?
- Using PyPy or Cython. You can also consider an alternative Python implementation. There are speedy Python variants out there. See below for more info on this.
You are probably using the reference implementation of Python, CPython. Most people do. It’s called CPython because it’s written in C. If you are sure your code is CPU bound, meaning it’s doing lots of calculations, you should look into PyPy, an alternative to CPython. It’s potentially a quick fix that doesn’t require you to change a single line of code.
PyPy claims that, on average, it is 4.4 times faster than CPython. It does so by using a technique called just-in-time compilation (JIT). Java and the .NET framework are other notable examples of JIT compilation. In contrast, CPython uses interpretation to execute your code. Although this offers a lot of flexibility, it’s also very slow.
With JIT, your code is compiled while running the program. It combines the speed advantage of ahead-of-time compilation (used by languages like C and C++) with the flexibility of interpretation. Another advantage is that the JIT compiler can keep optimizing your code while it is running. The longer your code runs, the more optimized it will become.
PyPy has come a long way over the last few years and can generally be used as a drop-in replacement for both Python 2 and 3. It works flawlessly with tools like Pipenv as well. Give it a try!
Cython offers C-like performance with code that is written mostly in Python. Cython makes it possible to compile parts of your Python code to C code. This way, you can convert crucial parts of an algorithm to C, which will generally offer a tremendous performance boost.
Cython is a superset of the Python language, meaning it adds extras to the Python syntax. It’s not a drop-in replacement like PyPY. It requires adaptions to your code and knowledge of the extras Cython adds to the language.
With Cython, it is also possible to take advantage of the C++ language because part of the C++ standard library is directly importable from Cython code. Cython is particularly popular among scientific users of Python. A few notable examples:
- The SageMath computer algebra system depends on Cython, both for performance and to interface with other libraries
- Significant parts of the libraries SciPy, pandas, and scikit-learn are written in Cython
- The XML toolkit, lxml, is written mostly in Cython