Globbing and Python’s “subprocess” module

July 20, 2017 . By Reuven

Python’s “subprocess” module makes it really easy to invoke an external program and grab its output. For example, you can say

import subprocess
print(subprocess.check_output('ls'))

and the output is then

$ ./blog.py
b'blog.py\nblog.py~\ndictslice.py\ndictslice.py~\nhexnums.txt\nnums.txt\npeanut-butter.jpg\nregexp\nshowfile.py\nsieve.py\ntest.py\ntestintern.py\n'

subprocess.check_output returns a bytestring with the filenames on my desktop. To deal with them in a more serious way, and to have the ASCII 10 characters actually function as newlines, I need to invoke the “decode” method, which results in a string:

output = subprocess.check_output('ls').decode('utf-8')
print(output)

This is great, until I want to pass one or more arguments to my “ls” command.  My first attempt might look like this:

output = subprocess.check_output('ls -l').decode('utf-8')
print(output)

But I get the following output:

$ ./blog.py
Traceback (most recent call last):
 File "./blog.py", line 5, in <module>
 output = subprocess.check_output('ls -l').decode('utf-8')
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
 **kwargs).stdout
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 403, in run
 with Popen(*popenargs, **kwargs) as process:
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 707, in __init__
 restore_signals, start_new_session)
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 1333, in _execute_child
 raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'ls -l'

The most important part of this error message is the final line, in which the system complains that I cannot find the program “ls -l”. That’s right — it thought that the command + option was a single program name, and failed to find that program.

Now, before you go and complain that this doesn’t make any sense, remember that filenames may contain space characters. And that there’s no difference between a “command” and any other file, except for the way that it’s interpreted by the operating system. It might be a bit weird to have a command whose name contains a space, but that’s a matter of convention, not technology.

Remember, though, that when a Python program is invoked, we can look at sys.argv, a list of the user’s arguments. Always, sys.argv[0] is the program’s name itself. We can thus see an analog here, in that when we invoke another program, we also need to pass that program’s name as the first element of a list, and the arguments as subsequent list elements.

In other words, we can do this:

output = subprocess.check_output(['ls', '-l']).decode('utf-8')
print(output)

and indeed, we get the following:

$ ./blog.py
total 88
-rwxr-xr-x 1 reuven 501 126 Jul 20 21:43 blog.py
-rwxr-xr-x 1 reuven 501 24 Jul 20 21:31 blog.py~
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43 dictslice.py
-rwxr-xr-x 1 reuven 501 397 Jun 8 14:47 dictslice.py~
-rw-r--r-- 1 reuven 501 54 Jul 16 11:11 hexnums.txt
-rw-r--r-- 1 reuven 501 20 Jun 25 22:24 nums.txt
-rw-rw-rw- 1 reuven 501 51011 Jul 3 13:51 peanut-butter.jpg
drwxr-xr-x 6 reuven 501 204 Oct 31 2016 regexp
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03 showfile.py
-rwxr-xr-x 1 reuven 501 143 May 19 02:37 sieve.py
-rw-r--r-- 1 reuven 501 0 May 28 09:15 test.py
-rwxr-xr-x 1 reuven 501 72 May 18 22:18 testintern.py

So far, so good.  Notice that check_output can thus get either a string or a list as its first argument.  If we pass a list, we can pass additional arguments, as well:

output = subprocess.check_output(['ls', '-l', '-F']).decode('utf-8')
print(output)

As a result of adding the “-F’ flag, we now get a file-type indicator at the end of every filename:

$ ls -l -F
total 80
-rwxr-xr-x 1 reuven 501 137 Jul 20 21:44 blog.py*
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43 dictslice.py*
-rw-r--r-- 1 reuven 501 54 Jul 16 11:11 hexnums.txt
-rw-r--r-- 1 reuven 501 20 Jun 25 22:24 nums.txt
-rw-rw-rw- 1 reuven 501 51011 Jul 3 13:51 peanut-butter.jpg
drwxr-xr-x 6 reuven 501 204 Oct 31 2016 regexp/
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03 showfile.py*
-rwxr-xr-x 1 reuven 501 143 May 19 02:37 sieve.py*
-rw-r--r-- 1 reuven 501 0 May 28 09:15 test.py
-rwxr-xr-x 1 reuven 501 72 May 18 22:18 testintern.py*

It’s at this point that we might naturally ask: What if I want to get a file listing of one of my Python programs? I can pass a filename as an argument, right?  Of course:

output = subprocess.check_output(['ls', '-l', '-F', 'sieve.py']).decode('utf-8')
print(output)

And the output is:

-rwxr-xr-x 1 reuven 501 143 May 19 02:37 sieve.py*

Perfect!

Now, what if I want to list all of the Python programs in this directory?  Given that this is a natural and everyday thing we do on the command line, I give it a shot:

output = subprocess.check_output(['ls', '-l', '-F', '*.py']).decode('utf-8')
print(output)

And the output is:

$ ./blog.py
ls: cannot access '*.py': No such file or directory
Traceback (most recent call last):
 File "./blog.py", line 5, in <module>
 output = subprocess.check_output(['ls', '-l', '-F', '*.py']).decode('utf-8')
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
 **kwargs).stdout
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 418, in run
 output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ls', '-l', '-F', '*.py']' returned non-zero exit status 2.

Oh, no!  Python thought that I was trying to find the literal file named “*.py”, which clearly doesn’t exist.

It’s here that we discover that when Python connects to external programs, it does so on its own, without making use of the Unix shell’s expansion capabilities. Such expansion, which is often known as “globbing,” is available via the Python “glob” module in the standard library.  We could use that to get a list of files, but it seems weird that when I invoke a command-line program, I can’t rely on it to expand the argument.

But wait: Maybe there is a way to do this!  Many functions in the “subprocess” module, including check_output, have a “shell” parameter whose default value is “False”. But if I set it to “True”, then a Unix shell is invoked between Python and the command we’re running. The shell will surely expand our star, and let us list all of the Python programs in the current directory, right?

Let’s see:

output = subprocess.check_output(['ls', '-l', '-F', '*.py'], shell=True).decode('utf-8')
print(output)

And the results:

$ ./blog.py
blog.py
blog.py~
dictslice.py
dictslice.py~
hexnums.txt
nums.txt
peanut-butter.jpg
regexp
showfile.py
sieve.py
test.py
testintern.py

Hmm. We didn’t get an error.  But we also didn’t get what we wanted.  This is mighty strange.

The solution, it turns out, is to pass everything — command and arguments, including the *.py — as a single string, and not as a list. When you’re invoking commands with shell=True, you’re basically telling Python that the shell should break apart your arguments and expand them.  If you pass a list to the shell, then the parsing is done the wrong number of times, and in the wrong places, and you get the sort of mess I showed above.  And indeed, with shell=True and a string as the first argument, subprocess.check_output does the right thing:

output = subprocess.check_output('ls -l -F *.py', shell=True).decode('utf-8')
print(output)

And the output from our program is:

$ ./blog.py
-rwxr-xr-x 1 reuven 501 141 Jul 20 22:03 blog.py*
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43 dictslice.py*
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03 showfile.py*
-rwxr-xr-x 1 reuven 501 143 May 19 02:37 sieve.py*
-rw-r--r-- 1 reuven 501 0 May 28 09:15 test.py
-rwxr-xr-x 1 reuven 501 72 May 18 22:18 testintern.py*

The bottom line is that you can get globbing to work when invoking commands via subprocess.check_output. But you need to know what’s going on behind the scenes, and what shell=True does (and doesn’t) do, to make it work.

Related Posts

Prepare yourself for a better career, with my new Python learning memberships

Prepare yourself for a better career, with my new Python learning memberships

I’m banned for life from advertising on Meta. Because I teach Python.

I’m banned for life from advertising on Meta. Because I teach Python.

Sharpen your Pandas skills with “Bamboo Weekly”

Sharpen your Pandas skills with “Bamboo Weekly”
  • Parsing the output of “ls”, especially including a shell, is inefficient at best and non-portable and error-prone at worst. While knowing how to use the subprocess module effectively is a valuable skill, this is a bad example. The project should have been done with tools like os.listdir() and os.walk() instead.

    • My example wasn’t meant to show how it’s best to get a directory listing, but rather how people can learn to use the “subprocess” module. Perhaps I could/should have used a different command’s output, but I wanted to spend as little time as possible focusing on the command, and as much as possible focusing on the output from the command. I’m generally a big fan of “glob.glob”, as my students know, not just because it’s useful, but also because of the great fun it is to say “glob glob” out loud.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
    >