August 1, 2015

Perl style backticks in Python

A great feature in Perl is Backticks. Copied from the shell, and built right into the base language, this allows you to capture the output of any shell command. But something that seems so simple in Perl, has always given me problems in Python. It’s not built in, and using the full “open” semantics seems to take far too much time and effort. To the rescue, from the Python standard library, we have the subprocess module. Not quite as simple as backticks, but it has a few extra features too.

In Perl

A great feature in Perl is Backticks. Copied from the shell, and built right into the base language, this allows you to capture the output of any shell command. e.g…

:::perl
fileContents  = `cat filename`
dateString = `date`
result = `wget http://website.com/data.csv`

Or with the alternative quotes, so you can put backticks in your shell command…

:::perl
qx#sudo chown `id -u` /somedir# or die "chown failed"

In Python

But something that seems so simple in Perl, has always given me problems in Python. It’s not built in, and using the full “open” semantics seems to take far too much time and effort.

To the rescue, from the Python standard library, we have the subprocess module. Not quite as simple as backticks, but it has a few extra features too. Most notably, because it’s a module, 90% of your python programs won’t have to load it, a performance boost over perl that includes backticks every time.

Simple case

For example, in the simplest case, replacing backticks looks like this…

:::python
from subprocess import check_output
output = check_output(["mycmd", "myarg"])

Here we pass a list to the check_output function, the first element is the binary to be executed. Note that this can’t be a shell builtin, such as “history” or “export” as we’re not using a shell to run the command. This first element must be the file name only (possibly with full path), no arguments. Arguments go in the second and subsequent elements of the list.

With shell interpretation

If you need shell interpretation, add a named option to the call like this

:::python
from subprocess import check_output
output = check_output("mycmd myarg", shell=True)

Note that here we do not pass a list, we pass a string. Because the shell will be interpreting the line. This allows us use normal shell redirection and pipes.

:::python
from subprocess import check_output
output = check_output("dmesg | grep hda", shell=True)

Shell interpreting strings you get from input can be dangerous. Take this example, the user enters a name and you want to print the matching userid

:::python
from subprocess import check_output
username = raw_input('Enter a user name: ')
print check_output("id -u " + username, shell=True)

Then imagine our user enters a name of “Robert; rm -rf /” … that would give you a shell string with 2 commands separated by a semicolon, and a good test of your restore procedure!

:::shell
id -u Robert; rm -rf /

Shout out to little Bobby Tables

Errors

What happens if our process returns a non-zero return code? The “check_output” is going to raise a “CalledProcessError” and your python program is going to painfully exit. unless you’re 100% sure it will run perfectly every time, you should check for this. Just like checking for any other python error.

:::python
from subprocess import CalledProcessError, check_output
try:
    output = check_output(["ls", "non existent"])
except CalledProcessError as err:
    print(err.returncode)

What happens if our process writes to stderr? To prevent error messages from commands run through check_output() from being written to the console, set the stderr parameter to the constant STDOUT

:::python
import subprocess
output = subprocess.check_output(
    'echo to stderr 1>&2',
    shell=True,
    stderr=subprocess.STDOUT,
    )
print output        # contains stderr as well

Just the return code

If you only want the return code, and will throw away the output, then the more general “call” is better suited than check_output. Notably “call” doesn’t ever raise “CalledProcessError” like check_output.

:::python
import subprocess
subprocess.call(["ls", "-l"])           # Returns 0
subprocess.call("exit 1", shell=True)       # Returns 1

Both stdout and stderr

You can capture stdout and stderr, without mixing them up, using Popen. The .communicate method waits for the process to finish, then returns the stdout and stderr as a tuple.

:::python
import subprocess
proc = subprocess.Popen('SomeProcess', 
    shell=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    )
stdout_data, stderr_data = proc.communicate()

Line by Line

I wanted to include reading from a long running process, line by line. Perhaps you want to stop it after you find what you’re interested in, no need to process the rest of the data. But it turns out, this is incredibly hard. There’s buffering in Python, buffering in your shell, and probably buffering in your command too. There’s possible deadlocks if you’re writing and reading from a process at the same time.

I’ll try to write about it another time…