Despite the many libraries on PyPI, sometimes you need to run an external command from your Python code. The built-in Python subprocess module makes this relatively easy. In this article, you’ll learn some basics about processes and sub-processes.
We’ll use the Python subprocess module to safely execute external commands, capture the output, and optionally feed them with input from standard in. If you’re familiar with the theory of processes and sub-processes, you can safely skip the first section.
Table of Contents
Processes and sub-processes
A program that is executed on a computer is also called a process. But what is a process, exactly? Let’s define it more formally:
- Process
- A process is the instance of a computer program that is being executed by one or more threads.
A process can have multiple Python threads, this is called multi-threading. In turn, a computer can run multiple processes at once. These processes can be different programs, but they can also be multiple instances of the same program. Our article on concurrency with Python explains this in great detail. The following images come from that article, too:
If you want to run an external command, it means you need to create a new process from your Python process. Such a process is often called a child process or a sub-process. Visually, this is what happens when one process spawns two sub-processes:
What happens internally (inside the OS kernel) is what’s called a fork. The process forks itself, meaning a new copy of the process is created and started. This can be useful if you want to parallelize your code and utilize multiple CPUs on your machine. That’s what we call multiprocessing.
However, we can utilize this same technique to start another process. First, the process forks itself, creating a copy. That copy, in turn, replaces itself with another process: the process you were looking to execute.
The subprocess.run function
We can go the low-level way and do much of this ourselves using the Python subprocess module, but luckily, Python also offers a wrapper that will take care of all the nitty-gritty details and do so safely, too. Thanks to the wrapper, running an external command comes down to calling a function. This wrapper is the function run() from the subprocess package, and that’s what we’ll use in this article.
I thought it would be nice for you to know what’s going on internally, but if you feel confused, rest assured that you don’t need this knowledge to do what you want: running an external command with the Python subprocess module.
Create a Python subprocess with subprocess.run
Enough with the theory; it’s time to get our hands dirty and write some code to execute external commands.
First of all, you need to import the subprocess library. Since it is part of Python 3, you don’t need to install it separately. From this library, we’ll work with the run command. This command was added in Python 3.5. Make sure you have at least that Python version, but preferably you should be running the latest version. Check our detailed Python installation instructions if you need help with that.
Let’s start with a simple call to ls, to list the current directories and files:
>>> import subprocess >>> subprocess.run(['ls', '-al']) (a list of your directories will be printed)
In fact, we can call Python, the binary, from our Python code. Let’s request the version of the default python3 installation on our system next:
>>> import subprocess >>> result = subprocess.run(['python3', '--version']) Python 3.8.5 >>> result CompletedProcess(args=['python3', '--version'], returncode=0)
A line-by-line explanation:
- We import the subprocess library
- Run a subprocess, in this case the python3 binary, with one argument:
--version
- Inspect the result variable, which is of the type
CompletedProcess
The process returned code 0, meaning it was executed successfully. Any other return code would mean there was some kind of error. It depends on the process you called what the different return code means.
As you can see in the output, the Python binary printed its version number on standard out, which is usually your terminal. Your result may vary because your Python version will likely be different. Perhaps, you’ll even get an error looking like this: FileNotFoundError: [Errno 2] No such file or directory: 'python3'
. In this case, make sure the Python binary is called python3 on your system too, and that it’s in the PATH.
Capture output of a Python subprocess
If you run an external command, you’ll likely want to capture the output of that command. We can achieve this with the capture_output=True option:
>>> import subprocess >>> result = subprocess.run(['python3', '--version'], capture_output=True, encoding='UTF-8') >>> result CompletedProcess(args=['python3', '--version'], returncode=0, stdout='Python 3.8.5\n', stderr='')
As you can see, Python didn’t print its version to our terminal this time. The subprocess.run command redirected the standard out and standard error streams so it could capture them and store the result for us. After inspecting the result variable, we see that the Python version was captured from standard out. Since there were no errors, stderr is empty.
I also added the option encoding=’UTF-8′. If you don’t, subprocess.run
assumes the output is a stream of bytes because it doesn’t have this information. Try it, if you want. As a result, stdout
and stderr
will be byte arrays. Hence, if you know the output will be ASCII text or UTF-8 text, you’re better off specifying it so the run function encodes the captured output accordingly as well.
Alternatively, you can also use the option text=True without specifying the encoding. Python will capture the output as text. I’d recommend specifying the encoding explicitly if you know it.
Feeding data from standard input
If the external command expects data on standard input, we can do so easily as well with the input
option of Python’s subprocess.run
function. Please note that I’m not going into streaming data here. We’ll build on the previous examples here:
>>> import subprocess >>> code = """ ... for i in range(1, 3): ... print(f"Hello world {i}") ... """ >>> result = subprocess.run(['python3'], input=code, capture_output=True, encoding='UTF-8') >>> print(result.stdout) Hello world 1 Hello world 2
We just used Python to execute some Python code with the python3 binary. It’s completely useless but (hopefully) very instructive!
The code variable is a multi-line Python string, and we assign it as input to the subprocess.run
command using the input
option.
Running shell commands
If you are looking to execute shell commands on Unix-like systems, by which I mean anything you would normally type into a Bash-like shell, you need to realize that these are often not external binaries that are executed. For example, expressions like for
and while
loops, or pipes and other operators, are interpreted by the shell itself.
Python often has alternatives in the form of built-in libraries, which you should prefer. But if you need to execute a shell command, for whatever reason, subprocess.run
will happily do so when you use the shell=True
option. It allows you to enter commands just as if you were entering them in a Bash compatible shell:
>>> import subprocess >>> result = subprocess.run(['ls -al | head -n 1'], shell=True) total 396 >>> result CompletedProcess(args=['ls -al | head -n 1'], returncode=0)
But a warning is in place: using this method is prone to command injection attacks (see: caveats).
Caveats to look out for
Running external commands is not without risks. Please read this section very carefully.
os.system vs subprocess.run
You might see code examples where os.system()
is used to execute a command. The subprocess
module is more powerful, though, and the official Python docs recommend using it over os.system()
. Another issue with os.system
is that it is more prone to command injection.
Command injection
A common attack, or exploit, is to inject extra commands to gain control over a computer system. For example, if you ask your user for input and use that input in a call to os.system()
or a call to subprocess.run(...., shell=True)
, you’re at risk of a command injection attack.
To demonstrate, the following code allows us to run any shell command:
import subprocess thedir = input() result = subprocess.run([f'ls -al {thedir}'], shell=True)
Because we directly used the user input, the user can run any command simply by appending it with a semicolon. E.g., the following input will list the / directory and echo a text. Try it for yourself:
/; echo "command injection worked!";
The solution is not to try and clean the user input. You might be tempted to start looking for semicolons and rejecting the input of you find one. Don’t; hackers can think of at least 5 other ways to append a command in this situation. It’s an uphill battle.
The better solution is to not use shell=True
, and feed the command in a list as in the earlier examples. Input like this will fail in such cases because the subprocess module will ensure the input is an argument to the program you’re executing instead of a new command.
With the same input but with shell=False
, you will get this:
import subprocess thedir = input() >>> result = subprocess.run([f'ls -al {thedir}'], shell=False) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.8/subprocess.py", line 489, in run with Popen(*popenargs, **kwargs) as process: File "/usr/lib/python3.8/subprocess.py", line 854, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'ls -al /; echo "command injection worked!";'
The command is treated as an argument to ls
, which in turn tells us that It can’t find that file or directory.
User input is always dangerous
In fact, using user input is always dangerous, not just because of command injection. For example, suppose you allow a user to input a file name. After this, we read the file and show it to the user. Although this might seem harmless, a user could enter something like this: ../../../../configuration/settings.yaml
Where settings.yaml
might contain your database password… oops! You always need to sanitize and check user input properly. However, how to do that properly is beyond the scope of this article.
Keep learning
The following related resources will help you dive even deeper in this subject:
- The official documentation has all the details about the subprocess library
- Our article on Python concurrency explains more about processes and threads
- Our section on using the Unix shell might come in handy
- Learn some Basic Unix commands