Category Archives for "Python"

Globbing and Python’s “subprocess” module

Python’s “subprocess” module makes it really easy to invoke an external program and grab its output. For example, you can say

import subprocess

and the output is then

$ ./

subprocess.check_output returns a bytestring with the filenames on my desktop. To deal with them in a more serious way, and to have the ASCII 10 characters actually function as newlines, I need to invoke the “decode” method, which results in a string:

output = subprocess.check_output('ls').decode('utf-8')

This is great, until I want to pass one or more arguments to my “ls” command.  My first attempt might look like this:

output = subprocess.check_output('ls -l').decode('utf-8')

But I get the following output:

$ ./
Traceback (most recent call last):
 File "./", line 5, in <module>
 output = subprocess.check_output('ls -l').decode('utf-8')
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/", line 336, in check_output
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/", line 403, in run
 with Popen(*popenargs, **kwargs) as process:
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/", line 707, in __init__
 restore_signals, start_new_session)
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/", line 1333, in _execute_child
 raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'ls -l'

The most important part of this error message is the final line, in which the system complains that I cannot find the program “ls -l”. That’s right — it thought that the command + option was a single program name, and failed to find that program.

Now, before you go and complain that this doesn’t make any sense, remember that filenames may contain space characters. And that there’s no difference between a “command” and any other file, except for the way that it’s interpreted by the operating system. It might be a bit weird to have a command whose name contains a space, but that’s a matter of convention, not technology.

Remember, though, that when a Python program is invoked, we can look at sys.argv, a list of the user’s arguments. Always, sys.argv[0] is the program’s name itself. We can thus see an analog here, in that when we invoke another program, we also need to pass that program’s name as the first element of a list, and the arguments as subsequent list elements.

In other words, we can do this:

output = subprocess.check_output(['ls', '-l']).decode('utf-8')

and indeed, we get the following:

$ ./
total 88
-rwxr-xr-x 1 reuven 501 126 Jul 20 21:43
-rwxr-xr-x 1 reuven 501 24 Jul 20 21:31
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43
-rwxr-xr-x 1 reuven 501 397 Jun 8 14:47
-rw-r--r-- 1 reuven 501 54 Jul 16 11:11 hexnums.txt
-rw-r--r-- 1 reuven 501 20 Jun 25 22:24 nums.txt
-rw-rw-rw- 1 reuven 501 51011 Jul 3 13:51 peanut-butter.jpg
drwxr-xr-x 6 reuven 501 204 Oct 31 2016 regexp
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03
-rwxr-xr-x 1 reuven 501 143 May 19 02:37
-rw-r--r-- 1 reuven 501 0 May 28 09:15
-rwxr-xr-x 1 reuven 501 72 May 18 22:18

So far, so good.  Notice that check_output can thus get either a string or a list as its first argument.  If we pass a list, we can pass additional arguments, as well:

output = subprocess.check_output(['ls', '-l', '-F']).decode('utf-8')

As a result of adding the “-F’ flag, we now get a file-type indicator at the end of every filename:

$ ls -l -F
total 80
-rwxr-xr-x 1 reuven 501 137 Jul 20 21:44*
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43*
-rw-r--r-- 1 reuven 501 54 Jul 16 11:11 hexnums.txt
-rw-r--r-- 1 reuven 501 20 Jun 25 22:24 nums.txt
-rw-rw-rw- 1 reuven 501 51011 Jul 3 13:51 peanut-butter.jpg
drwxr-xr-x 6 reuven 501 204 Oct 31 2016 regexp/
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03*
-rwxr-xr-x 1 reuven 501 143 May 19 02:37*
-rw-r--r-- 1 reuven 501 0 May 28 09:15
-rwxr-xr-x 1 reuven 501 72 May 18 22:18*

It’s at this point that we might naturally ask: What if I want to get a file listing of one of my Python programs? I can pass a filename as an argument, right?  Of course:

output = subprocess.check_output(['ls', '-l', '-F', '']).decode('utf-8')

And the output is:

-rwxr-xr-x 1 reuven 501 143 May 19 02:37*


Now, what if I want to list all of the Python programs in this directory?  Given that this is a natural and everyday thing we do on the command line, I give it a shot:

output = subprocess.check_output(['ls', '-l', '-F', '*.py']).decode('utf-8')

And the output is:

$ ./
ls: cannot access '*.py': No such file or directory
Traceback (most recent call last):
 File "./", line 5, in <module>
 output = subprocess.check_output(['ls', '-l', '-F', '*.py']).decode('utf-8')
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/", line 336, in check_output
 File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/", line 418, in run
 output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ls', '-l', '-F', '*.py']' returned non-zero exit status 2.

Oh, no!  Python thought that I was trying to find the literal file named “*.py”, which clearly doesn’t exist.

It’s here that we discover that when Python connects to external programs, it does so on its own, without making use of the Unix shell’s expansion capabilities. Such expansion, which is often known as “globbing,” is available via the Python “glob” module in the standard library.  We could use that to get a list of files, but it seems weird that when I invoke a command-line program, I can’t rely on it to expand the argument.

But wait: Maybe there is a way to do this!  Many functions in the “subprocess” module, including check_output, have a “shell” parameter whose default value is “False”. But if I set it to “True”, then a Unix shell is invoked between Python and the command we’re running. The shell will surely expand our star, and let us list all of the Python programs in the current directory, right?

Let’s see:

output = subprocess.check_output(['ls', '-l', '-F', '*.py'], shell=True).decode('utf-8')

And the results:

$ ./

Hmm. We didn’t get an error.  But we also didn’t get what we wanted.  This is mighty strange.

The solution, it turns out, is to pass everything — command and arguments, including the *.py — as a single string, and not as a list. When you’re invoking commands with shell=True, you’re basically telling Python that the shell should break apart your arguments and expand them.  If you pass a list to the shell, then the parsing is done the wrong number of times, and in the wrong places, and you get the sort of mess I showed above.  And indeed, with shell=True and a string as the first argument, subprocess.check_output does the right thing:

output = subprocess.check_output('ls -l -F *.py', shell=True).decode('utf-8')

And the output from our program is:

$ ./
-rwxr-xr-x 1 reuven 501 141 Jul 20 22:03*
-rwxr-xr-x 1 reuven 501 401 Jul 17 13:43*
-rwxr-xr-x 1 reuven 501 1669 May 28 03:03*
-rwxr-xr-x 1 reuven 501 143 May 19 02:37*
-rw-r--r-- 1 reuven 501 0 May 28 09:15
-rwxr-xr-x 1 reuven 501 72 May 18 22:18*

The bottom line is that you can get globbing to work when invoking commands via subprocess.check_output. But you need to know what’s going on behind the scenes, and what shell=True does (and doesn’t) do, to make it work.


Five Python function parameters you should know and use

One of Python’s mantras is “batteries included.” which means that even with a bare-bones installation, you can do quite a bit. You can (and should) install packages from PyPI, but many day-to-day tasks can be accomplished with just the built-in data structures, functions, and methods.

What I’ve discovered over the years is that some of the functions and methods have useful parameters that can make our code shorter and more elegant. Here are some of the most elegant ones that I’ve found and use in my work:

1. str.split  (part 1)

One of the methods I use most often in my work is str.split. This method always returns a list, breaking the string into different elements. For example:

In [1]: s = 'abc,def,ghi'

In [2]: s.split(',')
Out[2]: ['abc', 'def', 'ghi']

In [3]: s = 'abc::def::ghi'

In [4]: s.split('::')
Out[4]: ['abc', 'def', 'ghi']

In [5]: s = 'this is a bunch of words'

In [6]: s.split(' ')
Out[6]: ['this', 'is', 'a', 'bunch', 'of', 'words']

All of this is great, and works just fine. But what if I do the following:

In [7]: s = 'a   b   c   d'  # Note: Three spaces between each letter

In [8]: s.split(' ')
Out[8]: ['abc', '', '', 'def', '', '', 'ghi', '', '', 'jkl']

Yuck!  Of course, this is one of those times that the computer does what we tell it, not what we want: We said that every time it encounters a space character, it should give us a new element in the output list. Sure enough, by having multiple space characters between the letters, we end up having lots of empty strings in our resulting list.

What’s worse is that I often use str.split to take input from users, or from files, and break it into individual elements. I’d love to break on one or more whitespace characters, ideally without reverting to the “re” module’s re.split(‘\s’).

Solution: Don’t pass any argument. The first parameter, named “sep”, has a default value of None. And when it has a value of None, str.split does indeed use one or more whitespace characters.  That’s right — str.split, when called with zero arguments, actually does more (and is often more useful) then when called with an argument:

In [10]: s = 'abc \n\n def \n\t ghi jkl\n\n'

In [11]: s.split()
Out[11]: ['abc', 'def', 'ghi', 'jkl']

2. str.split (part 2)

Let’s say you ask a user to enter their name, which you want to split into first and last names. For example:

In [13]: person = raw_input("Enter your name: ") # "input" in Python 3
Enter your name: Reuven Lerner

In [14]: first_name, last_name = person.split()

In [15]: print("First name is '{}', last name is '{}'".format(first_name, 
First name is 'Reuven', last name is 'Lerner'

Line 14 uses Python’s unpacking; since we know that the list produced by person.split() will contain two elements, I can safely assign those two elements into two variables (first_name and last_name).

But what if the person also enters a third name?

In [16]: person = raw_input("Enter your name: ")
Enter your name: Reuven Moshe Lerner

In [17]: first_name, last_name = person.split()
ValueError Traceback (most recent call last)
<ipython-input-17-cf257c8e7997> in <module>()
----> 1 first_name, last_name = person.split()

ValueError: too many values to unpack

Yikes! str.split() returned a list of three elements. And you cannot assign three elements into two variables. (Fine, Python 3 does allow for this, but we won’t discuss this here.)

We can, however, tell str.split how many times it should split. This is done by passing the second, optional argument. If you want to split on all whitespace, as we saw above, then you must explicitly pass None as the first argument:

In [18]: first_name, last_name = person.split(None, 1)

In [19]: print("First name is '{}', last name is '{}'".format(first_name,
 ...: last_name))
First name is 'Reuven', last name is 'Moshe Lerner'

Remember that the second parameter is called “maxsplits”, meaning the number of times str.split should do its thing. This means that the number you give will be the index of the final element in the returned list. In other words: I call person.split(None, 1), which means that I’ll get back a list with two elements — the latter of which has an index of 1.

What if I want to split things in the other direction, such that my first and middle names are in the first variable, and just my last name in the second variable? We can use a variant of str.split called str.rsplit (“right-side split”):

In [20]: first_name, last_name = person.rsplit(None, 1)

In [21]: print("First name is '{}', last name is '{}'".format(first_name,
 ...: last_name))
First name is 'Reuven Moshe', last name is 'Lerner'

3. enumerate

“enumerate” is a built-in function that exists to let us number things as we iterate over them. For example, let’s assume that I have a string, and want to print the letters of the string:

In [22]: s = 'abc'

In [23]: for one_letter in s:
 ...: print(one_letter)

What if I want to get the index of each letter, too? I can do this manually:

In [24]: s = 'abc'

In [25]: index = 0

In [26]: for one_letter in s:
 ...: print("{}: {}".format(index, one_letter))
 ...: index += 1
0: a
1: b
2: c

Because this is such a common thing that people want to do, we can instead use enumerate, which returns an iterator that produces tuples. Each tuple contains two elements, the first of which is the index and the second of which is the element from the enumerated sequence. Because we know that each tuple will contain two elements, we can grab them with unpacking:

In [27]: for index, one_letter in enumerate(s):
…: print(“{}: {}”.format(index, one_letter))
0: a
1: b
2: c

But wait, what if you are presenting this information to a non-programmer, for whom it seems weird to start numbering with zero? One solution is to send them to a programming course, but if there’s no time, then you can pass “enumerate” a second argument, the number with which numbering should start:

In [28]: for index, one_letter in enumerate(s, 1):
 ...: print("{}: {}".format(index, one_letter))
1: a
2: b
3: c

Of course, we can start with any number we want:

In [29]: for index, one_letter in enumerate(s, 72):
 ...: print("{}: {}".format(index, one_letter))
72: a
73: b
74: c

4. int()

“int” is the integer type, widely used in Python to represent whole numbers. Many Python developers know that we can turn strings into integers by invoking the “int” function.

Of course, there’s not really an “int function.”  Instead, we’re using “int” as a class to create a new instance of int. So when I say


I get a new instance of “int” back, with the value 5.  And when I say


I get back a different instance of “int”, with the value 12345.

But “int” takes a second argument, which lets us tell Python the base of the source data. For example, if I say

int('12345', 16)

then we get back 74565, because we asked Python to give us the value of 0x12345.

You can actually use any number base you want, from 1 through 36 — which is particularly useful for those of us with 36 fingers. But it’s not uncommon for my clients to be reading from files containing hexadecimal numbers. For example, let’s assume that we want to sum the hex numbers on each line of the following file:

10 20 30 4a 5b 6c
ff ef fa 00 20 3b
ab cd ef af be cd

We can do something like this:

In [35]: for one_line in open('hexnums.txt'):
 ...: print(sum([int(one_number, 16)
 ...: for one_number in one_line.split()]))

In other words:

  • We open the file, and read it line by line
  • We split each line on whitespace, resulting in a list
  • We interpret each number in hex
  • The resulting list of integers can then be passed to the “sum” function
  • We print the sum for each line

If you work with files that contain binary, octal, or hex numbers, this can really be handy. To be honest, I don’t do this very much — but I work with a number of companies that do, and for whom this is a real lifesaver.

5. dict.get

Dictionaries are everywhere in Python. They’re easy to define, and easy to work with. For example:

In [36]: d = {'a':1, 'b':2, 'c':3}

In [37]: d['a']
Out[37]: 1

In [38]: d['z']
KeyError Traceback (most recent call last)
<ipython-input-38-da9eddaf4274> in <module>()
----> 1 d['z']

KeyError: 'z'

Oh, right — but if you request a key that doesn’t exist, you’re going to get a KeyError exception.

Let’s write a little program that lets the user repeatedly query a dictionary. If the user gives us an empty string, then we’ll exit from the loop, but otherwise we’ll either print the value associated with the key, or give an error:

In [42]: while True:
 ...:         k = raw_input("Enter key: ")
 ...:         if not k:
 ...:             break
 ...:         elif k in d:
 ...:             print("d[{}] is {}".format(k, d[k]))
 ...:         else:
 ...:             print("{} isn't a key in d".format(k))
Enter key: a
d[a] is 1
Enter key: b
d[b] is 2
Enter key: c
d[c] is 3
Enter key: d
d isn't a key in d
Enter key: <enter>

This works fine, and is a pretty standard way that I’ve seen people check for keys in order to avoid exceptions. But often, the dict.get method will work even better.  Basically, dict.get does the same thing as square brackets ([ ]), except that if the key doesn’t exist, it returns None. For example:

In [43]: while True:
 ...:         k = raw_input("Enter key: ")
 ...:         if not k:
 ...:             break
 ...:         else:
 ...:             print("value of d[{}] is {}".format(k, d.get(k)))
Enter key: a
value of d[a] is 1
Enter key: b
value of d[b] is 2
Enter key: c
value of d[c] is 3
Enter key: d
value of d[d] is None
Enter key:

Now, you might not want to display None to your users. But you can always trap for None in your code, and then tell the user that the key doesn’t exist.

But dict.get takes a second, optional parameter. If you pass a second argument, you can change the default value from None to something else. For example:

In [44]: p = {'first':'Reuven', 'last':'Lerner'}

In [45]: p.get('first')
Out[45]: 'Reuven'

In [46]: p.get('last')
Out[46]: 'Lerner'

In [47]: p.get('middle', '')
Out[47]: ''

Now I can query the dict for the “middle” key, getting an empty string if the key doesn’t exist.  Another example:

In [48]: countries = {'New York':'USA', 'London':'England', 'Moscow':'Russia'}

In [51]: countries.get('Amsterdam')

In [52]: countries.get('Amsterdam', "I don't know")
Out[52]: "I don't know"

In [53]: countries.get('New York', "I don't know")
Out[53]: 'USA'

In [54]: countries.get('Moscow', "I don't know")
Out[54]: 'Russia'

Notice that when the key does exist, there’s no difference between d[k] and d.get(k).  The question is how you want to deal with a key that doesn’t exist, and dict.get often makes life much easier in these cases.

What optional parameters do you find most useful in Python?

Also: I’m teaching three live Python courses (on functional programming, advanced objects, and decorators) in the next few weeks; early-bird tickets are still available for a limited time. Grab them now, and improve your Python fluency from your own computer.


One-day birthday sale — on July 14th, get 47% off my courses, books, and subscriptions

Today is my birthday!

To celebrate, I’m offering a one-day 47% sale on many of my products:

Just enter the “birthday” coupon code when buying any of these, and you’ll get 47% off. These discounts are good for one day only — Friday, July 14th.


Announcing: Three live Python courses

If you’re like many of the Python developers I know, the basics are easy for you: Strings, lists, tuples, dictionaries, functions, and even objects roll off of your fingers and onto your keyboard. Your day-to-day tasks have become significantly easier as a result of Python, and you’re comfortable using it for tasks at work and home.

But some parts of Python remain difficult, mysterious, and outside of your comfort zone:

  • When you want to use a list comprehension, you have to go to Stack Overflow to remember how they work — to say nothing of set and dict comprehensions.
  • You know that there is a difference between functions and methods, but you can’t quite put your foot on what that difference is, or how Python rewrites “self” to be the first argument to every method.
  • You keep hearing about “decorators,” and how they allow you to do all sorts of magical things to functions and classes — but every time you start reading about them, you get confused or distracted.

Sound familiar? If so, then I want to help.

As you probably know, I spend just about every day at one of the world’s best companies — Apple, Cisco, IBM, PayPal, VMWare, and Western Digital, among others — teaching their engineers how to use Python.

The engineers who learn these techniques benefit by having more “tools in their toolbox,” as I like to put it; when a problem presents itself, they have more options at their disposal. I help them to solve new types of problems, or to solve existing problems more quickly. These engineers become more valuable to their employers, and more valuable on the larger job market.

I’m announcing three courses that you can take, from the comfort of your home or office, using the content I’ve presented to these companies:

  • Tuesday, July 25: Functional programming in Python
    • comprehensions
    • custom sorting
    • passing functions as arguments
    • lambda expressions
    • map, filter, and reduce
  • Wednesday, August 2: Advanced Python objects
    • attributes
    • methods vs. functions
    • class attributes
    • inheritance
    • methods vs. functions
    • descriptors
    • dunder methods
  • Thursday, August 3: Python decorators
    • properties and other built-in decorators
    • writing decorators
    • decorating functions, objects, and methods

Each of these classes will run live, for five hours (with two 15-minute breaks):

  • New York: 7 a.m. – 2 p.m.
  • London: 12 noon – 5 p.m.
  • Israel: 2 p.m. – 7 p.m.
  • Mumbai: 5:30 p.m. – 10:30 p.m.

Each will be packed with lectures, accompanied by tons of live-coding examples, many exercises that you’ll be expected to solve (and which we’ll review together when you’re done), and plenty of time for interactions and questions.  Indeed, please come with lots of questions, to make the class more interesting and relevant.

Each course costs $350, and will give you:

  • Access to the live audio/video/chat feed,
  • PDFs of my slides,
  • the Jupyter notebook I use during my live-coding demos,
  • and solutions to all of the exercises

I’m offering discounts to people who buy more than one course:

  • Buy two courses, and save $100, for a total of $600.  Just use the “2sessions” coupon code when purchasing each one.
  • Buy all three courses, and save $250, for a total of $800.  Just use the “3sessions” coupon code when purchasing each one.

As always, I’m also offering a discount to students; e-mail me, and I’ll send you the appropriate discount code.

Convinced?  I hope so!  View the full course descriptions here, and then register for them:

But wait!  If you register before Monday, July 18th, then you can save 15% more, by purchasing an early-bird ticket.

I’m very excited to be offering these courses.  They won’t be my last ones — but I’ll next be teaching other topics, so if these subjects interest you, you should definitely attend.

I hope that you can join me for these live, online courses.


Raw strings to the rescue!

Whenever I teach Python courses, most of my students are using Windows. And thus, when it comes time to do an exercise, I inevitably end up with someone who does the following:

for one_line in open('c:\abc\def\ghi'):


The above code looks like it should work. But it almost certainly doesn’t. Why? Because backslashes (\) in Python strings are used to insert special characters. For example, \n inserts a newline, and \t inserts a tab. So when we create the above string, we think that we’re entering a simple path — but we’re actually entering a string containing ASCII 7, the alarm bell.

Experienced programmers are used to looking for \n and \t in their code.  But \a (alarm bell) and \v (vertical tab), for example, tend to surprise many of them. And if you aren’t an experienced programmer? Then you’re totally baffled why the pathname you’ve entered, and copied so precisely from Windows, results in a “file not found” error.

One way to solve this problem is by escaping the backslashes before the problematic characters.  If you want a literal “\n” in your text, then put “\\n” in your string.  By the same token, you can say “\\a” or “\\v”.  But let’s be honest; remembering which characters require a doubled backslash is a matter of time, experience, and discipline.

(And yes, you can use regular, Unix-style forward slashes on Windows.  But that is generally met by even more baffled looks than the notion of a “vertical tab.”)

You might as well double all of the backslashes — but doing that is really annoying.  Which is where “raw strings” come into play in Python.

A “raw string” is basically a “don’t do anything special with the contents” string — a what-you-see-is-what-you-get string. It’s actually not a different type of data, but rather a way to enter strings in which you want to escape all backslashes. Just preface the opening quote (or double quotes) with the “r” character, and your string will be defined with all backslashes escaped. For example, if you say:


then you’ll see




But if you say:


then you’ll see


I suggest using raw strings whenever working with pathnames on Windows; it allows you to avoid guessing which characters require escaping. I also use them whenever I’m writing regular expressions in Python, especially if I’m using \b (for word boundaries) or using backreferences to groups.

Raw strings are one of those super-simple ideas that can have big implications on the readability of your code, as well as your ability to avoid problems.  Avoid such problems; use raw strings for all of your Windows pathnames, and you’ll be able to devote your attention to fixing actual bugs in your code.


Announcing: Weekly Python Exercise

Weekly Python Exercise logoI have a great job.  Really, I’m quite fortunate: For many years, I’ve spent nearly every day helping engineers at some of the world’s largest and best companies (including Apple, Cisco, Ericsson, HP, PayPal, SanDisk, and VMWare) to become better Python developers.

My students are are experienced, intelligent engineers who are using Python in their work. Some of them are at the start of their Python journey, and others have been using the language for several years. But all of them feel that if they were just more fluent with Python, they could express themselves — and thus solve problems — more quickly and easily.

How do I help these engineers to become more fluent? The answer is simple: Practice. Lots and lots of practice.

Programming languages are like natural languages: When you’re fluent, the ideas flow out of you without having to think twice about vocabulary and syntax. When you’re speaking in your native language, you’re not thinking “third person, male, past tense.”  Rather, you just say what you’re thinking.  The same is true for programming languages; at a certain point, you don’t have to think about how to create a dispatch table with functions and a dictionary.  You just do it.

The only way to really learn a new natural language is through practice. You’ll make mistakes along the way, but those are an important part of the learning process. Over time, and with repeated practice, you’ll slowly but surely internalize the language’s structure. You’ll become fluent, able to express your ideas naturally and effortlessly.

Weekly Python Exercise is designed to help you gain that fluency, over time, through regular and repeated practice. Every Tuesday, I’ll send you a new Python exercise.  I want you to struggle with it, but not too much; my hope is that the exercises will never take more than 30 minutes to solve. As you work through the exercise, and then as you review the solution and explanation that I send each Friday, you’ll be growing and improving as a Python programmer. You’ll be gaining understanding and knowledge that will make you more efficient and confident.

Moreover, you’ll find yourself spending far less time on Stack Overflow. I’m a big fan of Stack Overflow, but going there too often interrupts your flow, impedes your productivity, and adds to your frustration.

  • If you use Python on a regular basis, but feel like you’re consulting Stack Overflow a bit too frequently, then these exercises are for you.
  • If you’re getting your work done in Python, but think that your code could look more “Pythonic,” then these exercises are for you.
  • If you want to feel more comfortable with the standard library and well-known packages on PyPI, then these exercises are for you.
  • If you want to explore more advanced aspects of Python, be they functional programming techniques, objects, generators, decorators, or async code, then these exercises are for you.

Weekly Python Exercise is a subscription service: You can purchase it using PayPal or a credit card (with more options coming soon). You can pay each month, or at a reduced rate with an annual subscription. You can also buy a subscription for your team at work, ensuring that all of you will become more fluent and effective.

Along with your subscription, you’ll have free access to a private forum in which we’ll discuss the exercises (and answers). I’ve even set up a special forum area for people interested in pair programming with one another to find a solution.

I’m proud and excited to be launching Weekly Python Exercise.  But I’ll be doubly proud and excited to know that I’m helping developers around the world, not just those who participate in my courses, to become more fluent and effective Python developers.

Any questions?  Just e-mail me.  Otherwise, sign up, and start getting more fluent in Python!

Click here to subscribe to Weekly Python Exercise!


Boolean indexing in NumPy and Pandas: A free e-mail course for aspiring data scientists

NumPyFor nearly two years, I have been teaching my introductory course in data science and machine learning to companies around the world. On the one hand, participants are excited by data science, and all of the potential that it has to change our world. On the other hand, there is a lot to learn in order to be effective — a mix of theory and practice. While Python’s tools reduce the learning curve, you still end up having to learn about NumPy, Pandas, Matplotlib, scikit-learn, and a variety of other tools, such as the Jupyter notebook.

One of the things that most surprises and frustrates my students is the notion of “boolean indexing.” If you want to retrieve all of the odd elements from a NumPy array, you don’t use a “for” loop or even a list comprehension. Rather, you use a boolean index, as follows:


In order to cut down on the confusion, and help people to understand and use this powerful tool, I have created a free, 15-part e-mail course.  Each day, you’ll be introduced to another aspect of boolean indexing.  In every lesson, I give exercises, to help you practice what you’ve learned, and internalize the ideas.  By the end, you’ll know how to work with data in NumPy and Pandas (both series and data frames).

The course is 100% free, and a new lesson arrives in your e-mail inbox every day. So if you, or a colleague of yours, has been frustrated trying to understand how boolean indexing works, this is your chance!  The more fluently you can use these tools, the better you can be as a data scientist. Which, I would argue, is better for the world.

Sign up here: 

2+2 might be 4, but what is True+True?

In Python, we know that we can add two integers together:

>> 2 + 2

And of course, we can add two strings together:

>> 'a' + 'b'

We can even add two lists together:

>> [1,2,3] + [4,5,6]
[1, 2, 3, 4, 5, 6]

But what happens when you add two booleans together?

>> True + True

Huh?  What’s going on here?

This would seem to be the result of several different things going on.

First of all, booleans in Python are actually equal to 1 and 0. We know that in a boolean context (i.e., in an “if” statement or “while” loop), 1 is considered to be True, and 0 is considered to be False. But then again, 10, 100, and ‘abc’ are also considered to be True, while the empty string is considered to be False. All values are turned into booleans (using __nonzero__ in Python 2, and __bool__ in Python 3) when they’re in a place that requires a boolean value.

But whereas 10 and the empty string are turned into True and False (respectively) in a boolean context, 1 and 0 are special. True and False are really equal to 1 and 0, as we can see here:

>> True == 1
>>> True == 10
>>> False == 0
>>> False == ''

So, it would seem that while it’s preferable to use True and False, rather than 1 and 0, the difference is a semantic one, for programmers.  Behind the scenes, they’re really numbers.

But how can that be?  Why in the world would Python consider two objects of completely different types to have equal values?

The answer is that in Python, the “bool” type is actually a subclass of “int”, as we can see here:

>> bool.__bases__
(<class 'int'>,)

So while it’s still a bit weird for two values with different types to be considered equal, it’s not totally weird, given that booleans are basically a specialized form of integers.   And indeed, if we look at the documentation for booleans, we find that it implements very few methods of its own; most are inherited from int.  One of the many inherited operators is __eq__, the method that determines whether two objects are equal. Which means that when we’re comparing 1 and True, we’re really just comparing 1 (of type int) and 1 (of type bool, a subclass of int, which means that it behaves like an int).

This, then, explains what’s going on: When we say “True + True”, the int.__add__ operator is called. It gets two instances of bool, each with a value of 1, and which know how to behave as if they were regular ints. So we get 2 back.

Just for kicks, what happens if we say

True += 1

in Python 3, we get an error, saying that we can’t assign to a keyword. That’s because in Python 3, True and False are true keywords, rather than names defined in the “builtins” namespace. But in Python 2, they’re just built-in names. Which means that this is what happens in Python 2:

>>> True += 1
>>> True
>>> type(True)
<type 'int'>

Look at what we did: Not only did we add True + 1, but we then assigned this integer back to True! Which means that True is no longer an instance of “bool” in the “builtins” namespace, but rather an integer.

How did this happen? Because of Python’s scoping rules, known as “LEGB” for local-enclosing-globals-builtins.  When we assigned to True with +=, we didn’t change the value of True in builtins. Rather, we created a new global variable. The builtin value is still there, but because the global namespace gets priority, its value (2) masks the actual value (True, or 1) in the builtins namespace.

Indeed, look at this:

>>> True += 1
>>> True == __builtins__.True

How can we get out of this situation? By embracing nihilism, deleting truth (i.e., the True name we’ve defined in the global namespace).  Once we’ve done that, True in the builtins namespace will still exist:

>>> del(True)
>>> True

I’d like to think that none of us will ever be doing mathematical operations with booleans, or trying to assign to them. But understanding how these things work does provide a clearer picture of Python scoping rules, which govern how everything else in the language functions.

What other weird boolean/integer behavior have you noticed? I’m always curious to find, and then understand, these somewhat dark (and useless) aspects of Python!

Unfamiliar with Python scoping rules, or just want to brush up on them? Take my free, five-part “Python variable scoping” e-mail course:


Python function brain transplants

What happens when we define a function in Python?

The “def” keyword does two things: It creates a function object, and then assigns a variable (our function name) to that function object.  So when I say:

def foo():
      return "I'm foo!"

Python creates a new function object.  Inside of that object, we can see the bytecodes, the arity (i.e., number of parameters), and a bunch of other stuff having to do with our function.

Most of those things are located inside of the function object’s __code__ attribute.  Indeed, looking through __code__ is a great way to learn how Python functions work.  The arity of our function “foo”, for example, is available via foo.__code__.co_argcount.  And the byte codes are in foo.__code__.co_code.

The individual attributes of the __code__ object are read-only.  But the __code__ attribute itself isn’t!  We can thus be a bit mischievous, and perform a brain transplant on our function:

def foo():
      return "I'm foo!"

def bar():
      return "I'm bar!"

foo.__code__ = bar.__code__

Now, when we run foo(), we’re actually running the code that was defined for “bar”, and we get:

"I'm in bar!"

This is not likely to be something you want to put in your actual programs, but it does demonstrate some of the power of the __code__ object, and how much it can tell us about our functions.  Indeed, I’ve found over the last few years that exploring the __code__ object can be quite interesting!


The five-minute guide to setting up a Jupyter notebook server

JupyterNearly every day, I teach a course in Python.  And nearly every day, I thus use the Jupyter notebook: I do my live-coding demos in it, answer students’ questions using it, and also send it to my students at the end of the day, so that they can review my code without having to type furiously or take pictures of my screen.

I also encourage my students to install the notebook — not just so that they can open the file I send them at the end of each day, but also so that they can learn to experiment, iterate, and play with their code using a modern interactive environment.

My two-day class in regular expressions uses Python, but as a means to an end: I teach the barest essentials of Python in an hour or so, just so that we can read from files and search inside of them using regular expressions.  For this class, it’s overkill to ask the participants to install Python, especially if they plan to use other languages in their work.

When I teach this class, then, I set up a server that is only used for my course. Students can all log into the Jupyter notebook, create and use their own notebook files, and avoid installing anything new on their own computers.

Moreover, this whole setup now takes me just a few minutes. And since the class lasts only two days, I’m paying a few dollars for the use of a server that doesn’t contain any valuable data on it. So if someone happens to break into the server or ruin it, or if one of my students happens to mess around with things, nothing from my own day-to-day work will be ruined or at risk.

How do I do this? It’s pretty simple, actually:

  • Create a new server.  I use Digital Ocean, because they’re cheap and easy to set up. When I teach a class in Israel or Europe, I use one of their European data centers (or should I say “centres”), because it’ll be a bit faster.  If a large number of students will participate, then I might get a server with a lot of RAM, but I normally get one of the most basic DO plans.  I normally set up an Ubuntu server, but you’re welcome ot use any version of Linux you want.
  • I log in as root, and add a bunch of packages, starting with “python3-pip”. I now use Python 3 in my courses, and because Ubuntu still comes with Python 2, the “pip3” command is what you need for Python 3’s “pip”.  I install a bunch of things, including “pip3”:
apt-get install unzip emacs wget curl python3-pip
  • I then use “pip3” to install the “jupyter” package, which in turn downloads and installs everything else I’ll need:
pip3 install -U jupyter
  • Next, I create a “student” user.  It is under this user that I’ll run the notebook.
  • As the “student” user, I start the notebook with the “–generate-config” option. This creates a configuration directory (~/student/.jupyter) with a configuration file ( inside of it:
jupyter notebook --generate-config
  • Open the file, which is (as the suffix indicates) a Python file, defining a number of variables.  You’ll need to set a few of these in order for things to work; from my experience, changing four lines is sufficient:
    c.NotebookApp.open_browser = False    # no browser needed on a server
    c.NotebookApp.ip = ''          # listen on the network
    c.NotebookApp.password = ''            # don't require a password
    c.NotebookApp.token = ''              # don't require a security token
  • These final two instructions, to remove password protection and the security token, are debatable.  On the one hand, my server will exist for two days at most, and won’t have any important data on it.  On the other hand, you could argue that I’ve provided a new entry point for bad people in the world to attack the rest of the Internet.
  • Once that’s done, go back to the Unix shell, and launch the notebook using “nohup”, so that even if/when you log out, the server will still keep running:
    nohup jupyter notebook

    Once this is done, your server should be ready!  If you’re at the IP address, then you should at this point be able to point your browser to, and you’ll see the Jupyter notebook.  You can, of course, change the port that’s being used; I’ve found that when working from a cafe, non-standard ports can sometimes be blocked. Do remember that low-numbered ports, such as 80, can only be used by root, so you’ll need something higher, such as 8000 or 8080.

Also note that working this way means that all users have identical permissions. This means that in theory, one student can delete, change, or otherwise cause problems in another student’s notebook.  In practice, I’ve never found this to be a problem.

A bigger problem is that the notebook’s back end can sometimes fail. In such cases, you’ll need to ssh back into the server and restart the Jupyter notebook process. It’s frustrating, but relatively rare.  By using “nohup”, I’ve been able to avoid the server going down whenever my ssh connection died and/or I logged out.

I’ve definitely found that using the Jupyter notebook has improved my teaching.  Having my students use it in my courses has also improved their learning!   And as you can see, setting it up is a quick operation, requiring no more than a few minutes just before class starts.  Just don’t forget to shut down the server when you’re done, or you’ll end up paying for a server you don’t need.

Comprehensive documentation for setting up a Jupyter server, including security considerations, are at the (excellent) Jupyter documentation site.

1 10 11 12 13 14 17