For those who didn’t get that ninja feeling by now, this is for you. We’ll combine all our newly learned superpowers and perform multiprocessing, also called parallel computing, all with a single command in Bash!
Table of Contents
Xargs
The xargs command reads items from standard input (meaning, you can pipe data to it) and executes the specified command.
The basic syntax for xargs is:
xargs [options] [command [initial-arguments]]
At first sight, you might not see the benefits of this. Why not create a while loop and run each command? The benefit of xargs is that it can batch arguments and call your command once on many files, instead of individually for each file.
But find
can do so too!
Yup, you’re right. Still, there are more advantages. Xargs will work without needing find. So that’s one. But xargs has a special trick up its sleeve: it can run commands in parallel with the -P
-option.
This option takes a number that defines how many processes it needs to start in parallel. You read that right — in parallel!
Example
One real-world example for Bash multiprocessing with xargs, is to use this when doing video conversion on lots of files. Let’s dissect the following command together:
$ find . -name "*.mpeg" | xargs -P 4 -I {} ffmpeg -i {} -o {}.mp4
First, we find all mpeg files. We feed these files to xargs. Next we tell xargs, with -P 4
, to use four processes concurrently. We also tell xargs to substitute the file name in all places where it encounters {}
with the -I
option. So xargs gets the first video file and starts ffmpeg. Instead of waiting for ffmpeg to finish, xargs starts another instance of ffmpeg to process the second file in parallel. This goes on until it reaches four processes. If all four slots are taken, xargs waits for one to finish before starting the next process.
Video conversion is mostly CPU-bound. If your computer has four CPU cores, this conversion will go four times as fast compared to using regular find or a while loop. Isn’t that awesome?!
More info
For more info, read the man page on the command-line or on the web. Wikipedia has some examples of xargs usage as well that you might find useful.
Of you prefer to work in Python, our guide contains a comprehensive section on multiprocessing, or concurrency, with Python.