July 24, 2018 , bY reuven
Python

I’m a Unix guy, but the participants in my Python classes overwhelmingly use Windows. Inevitably, when we get to talking about working with files in Python, someone will want to open a file using the complete path to the file.  And they’ll end up writing something like this:

filename = 'c:\abc\def\ghi.txt'

But when my students try to open the file, they discover that Python gives them an error, indicating that the file doesn’t exist!  In other words, they write:

for one_line in open(filename):    print(one_line)

What’s the problem?  This seems like pretty standard Python, no?

Remember that strings in Python normally contain characters. Those characters are normally printable, but there are times when you want to include a character that isn’t really printable, such as a newline.  In those cases, Python (like many programming languages) includes special codes that will insert the special character.

The best-known example is newline, aka ‘\n’, or ASCII 10. If you want to insert a newline into your Python string, then you can do so with ‘\n’ in the middle.  For example:

s = 'abc\ndef\nghi'

When we print the string, we’ll see:

>>> print(s)

abc

def

ghi

What if you want to print a literal ‘\n’ in your code? That is, you want a backslash, followed by an “n”?  Then you’ll need to double the backslash:The “\\” in a string will result in a single backslash character. The following “n” will then be normal. For example:

s = 'abc\\ndef\\nghi'

When we say:

>>> print(s)

abc\ndef\nghi

It’s pretty well known that you have to guard against this translation when you’re working with \n. But what other characters require it? It turns out, more than many people might expect:

  • \a — alarm bell (ASCII 7)
  • \b — backspace (ASCII
  • \f — form feed
  • \n — newline
  • \r — carriage return
  • \t — tab
  • \v — vertical tab
  • \ooo —  character with octal value ooo
  • \xhh — character with hex value hh
  • \N{name} — Unicode character {name}
  • \uxxxx — Unicode character with 16-bit hex value xxxx
  • \Uxxxxxxxx — Unicode character with 32-bit hex value xxxxxxxx

In my experience, you’re extremely unlikely to use some of these on purpose. I mean, when was the last time you needed to use a form feed character? Or a vertical tab?  I know — it was roughly the same day that you drove your dinosaur to work, after digging a well in your backyard for drinking water.

But nearly every time I teach Python — which is, every day — someone in my class bumps up against one of these characters by mistake. That’s because the combination of the backslashes used by these characters and the backslashes used in Windows paths makes for inevitable, and frustrating, bugs.

Remember that path I mentioned at the top of the blog post, which seems so innocent?

filename = 'c:\abc\def\ghi.txt'

It contains a “\a” character. Which means that when we print it:

>>> print(filename)
c:bc\def\ghi.txt

See? The “\a” is gone, replaced by an alarm bell character. If you’re lucky.

So, what can we do about this? Double the backslashes, of course. You only need to double those that would be turned into special characters, from the table I’ve reproduced above: But come on, are you really likely to remember that “\f” is special, but “\g” is not?  Probably not.

So my general rule, and what I tell my students, is that they should always double the backslashes in their Windows paths. In other words:

>>> filename = 'c:\\abc\\def\\ghi.txt'

>>> print(filename)
c:\abc\def\ghi.txt

It works!

But wait: No one wants to really wade through their pathnames, doubling every backslash, do they?  Of course not.

That’s where Python’s raw strings can help. I think of raw strings in two different ways:

  • what-you-see-is-what-you-get strings
  • automatically doubled backslashes in strings

Either way, the effect is the same: All of the backslashes are doubled, so all of these pesky and weird special characters go away.  Which is great when you’re working with Windows paths.

All you need to do is put an “r” before the opening quotes (single or double):

>>> filename = r'c:\abc\def\ghi.txt'

>>> print(filename)
c:\abc\def\ghi.txt

Note that a “raw string” isn’t really a different type of string at all. It’s just another way of entering a string into Python.  If you check, type(filename) will still be “str”, but its backslashes will all be doubled.

Bottom line: If you’re using Windows, then you should just write all of your hard-coded pathname strings as raw strings.  Even if you’re a Python expert, I can tell you from experience that you’ll bump up against this problem sometimes. And even for the best of us, finding that stray “\f” in a string can be time consuming and frustrating.

PS: Yes, it’s true that Windows users can get around this by using forward slashes, like we Unix folks do. But my students find this to be particularly strange looking, and so I don’t see it as a general-purpose solution.

Enjoyed this article? Join more than 11,000 other developers who receive my free, weekly “Better developers” newsletter. Every Monday, you’ll get an article like this one about software development and Python:



 

July 23, 2018 , bY reuven
Open source, Unix and Linux

One of the most powerful Unix command-line utilities is “find” — but it also has a huge number of options, and most of the documentation I’ve read on “find” is hard to follow and understand.  That’s a shame, because once you understand what “find” does and how it works, you can accomplish quite a bit.  I hope that this post will show you some of the basics of “find”, so that you can take advantage of it in your day-to-day work.

The basic idea is that “find” looks through a directory (and all of its subdirectories), applying one or more filters when deciding which files are interesting, and executing one or more actions on matching files.

So, what can you do with “find”?

  • Move any backup log older than 30 days to /tmp/
  • Find all of the MP4 files larger than 100MB
  • Find all of the documents with either “doc” or “docx” extensions anywhere in your home directory
  • In a directory of text files, find those containing the phrase “budget” which have not been touched in the last 30 days

(In these examples, I’m going to use the GNU version of find, which is standard on Linux machines and available for the Mac via Homebrew.  Note that if you use Homebrew on the Mac, then GNU “find” will be installed as “gfind” by default.  Use the –with-default-names option to “brew install” if you want to avoid this prefix.)

Note: There is a big difference between “find” and “locate”, which are often confused for one another:

  • “find” looks for files according to a number of criteria, and performs an action on the files matching those criteria. The search takes place when you run the program.
  • “locate” uses a database (typically created with the “updatedb” command) for filenames matching a pattern, and returns those filenames.

So if you know that you have a file named “important.txt” somewhere on your system, then you probably want to use “locate” — assuming, of course, you have been updating your filename database on a regular basis, typically via “cron”.

If you don’t remember the name of the file, but do remember that you modified it in the last 14 days, and that it contains the phrase “very important”, then you can use “find”.

For example, let’s say that I just want to find all of the files in the current directory and all of its subdirectories.  I can say:

find . -print

This means: Look at all files and directories in the current directory (.) and contained within its subdirectories, and then print them.

Now, in GNU find, both of these arguments are optional; you can just say

find

but I don’t recommend doing so, if only because it’s a bit ambiguous.  Moreover, the longer version emphasizes that “find” looks through a directory, filters through the results (although we don’t have any filters here), and then executes something (in this case, “print”).  The filters and actions are specified using command-line arguments; thus, we say “-print” if we want to print the name of the file.  Note that it’s not “–print” (i.e., with two “-” characters before “print”), which we might expect.

Also notice that the result includes all files, including directories and special Unix files (e.g., device files).  If you want to only look at files, then you can specify the “-type” filter.  For example, the following command shows all files (i.e., not subdirectories, symbolic links, or the like) under the current directory:

find . -type f -print    # find regular files

What if you want to find directories?  Then instead of using “-type f”, specify “-type d”:

find . -type d -print    # find directories

What if I only want to find files that match a certain pattern?  Then I can filter using the “-name” test and the shell’s standard characters.  For example, let’s say I want to find all of the files that end with “.txt”.  I can then say:

find . -type f -name "*.txt" -print

The above applies two tests —only regular files (i.e., not directories or the like) that match the pattern “*.txt” will match and be printed.

What if I want to find files that end with “.txt” or “.text”?  In such cases, it might be easiest to use the “or” option, written as “-o”, that combines two tests.  For example:

find . -type f \( -name '*.txt' -o -name '*.text' \) -print

The “-o” option (for logical “or” — and yes, there is also a “-a” option that’s logical “and”) allows either of the tests to succeed in order for it to declare success. However, the items on either side of “-o” must be inside of parentheses.  Since parentheses in the Unix shell have their own uses, we need to preface them with backslashes, to avoid clashes between the levels of parsing.  But wait — if the “\(” and “\)” are touching the arguments, then you’ll get hard-to-understand errors.  So make sure that “\(” and “\)” are surrounded by whitespace, if you want to avoid trouble.

Let’s say that I want to find old files on my system. Unix filesystems keep track of file ages in three different ways:

  • ctime (creation time) — when was the file first created?
  • mtime (modification time) — when was the file last modified?
  • atime (access time) — when was the file last accessed/read?

Let’s say that I want to find files in the current directory (and below) that were last accessed 7 days ago.  I can say:

find . -atime 7 -print

The “atime” is measured in 24-hour increments, starting with midnight of the current day. So “-atime 7” means, “last accessed 7*24 hours before midnight today.”

But wait a second — when was the last time you wanted to find files that were accessed exactly 7 days ago?  It’s far more likely that you want to find files that were last accessed less than 7 days ago. In order to do that, you need to preface the number with a “-” sign:

find . -atime -7 -print

By contrast, if you want to find all of those files that were accessed more than 7 days ago, you’ll want to preface the number with a “+” sign:

find . -atime +7 -print

And of course, if you want to find files that were accessed more than 2 days ago, but less than 9 days ago, you can say:

find . -atime +2 -atime -9 -print

Depending on your needs, it might well be better to use “mtime” rather than “atime”. I’m often interested in finding files I changed recently, rather than those I read recently. The same rules apply; here’s how I would find all of those files that I last modified more than two days ago but less than 9 days ago:

find . -mtime +2 -mtime -9 -print

Notice that I’m able to combine two rules (i.e., two “atime” or “mtime” rules) without using “-a” to join them together with a logical “and”.

Another useful thing to look for is big files. What files, for example, are bigger than 2 GB? I can say the following:

$ find . -size +2G -print

(I believe that this “-size” option only works this way on GNU find. Other versions might well require that you specify the file size in blocks. It has been a while since I used non-GNU versions.)

Look familiar? That’s right; the “+2” means “greater than”, and the “G” suffix means “GB”.  You can use a bunch of suffixes to the number, to indicate just how big the file should be.  As you might have guessed, you can say “-2M” to mean “less than 2 MB”, which on a modern computer is just about everything, to be honest.

We can also combine these, just as we did with “atime” and “mtime”: What files are bigger than 500 MB and smaller than 5 GB?

find . -size +500M -size -5G -print

We can combine these filters with others. What files are bigger than 500 MB and smaller than 5 GB, and were last accessed no more than 30 days ago?

find . -size +500M -size -5G -atime -30 -print

You can imagine using this sort of command to find large, unused files, such as old videos that you had forgotten are on your filesystem. Indeed, what if I’m only interested in finding MP4 files that are larger than 500 MB, smaller than 5 GB, and accessed in the last 30 days? I  can add another condition:

find . -size +500M -size -5G -atime -30 -name "*.mp4" -print

There are lots of other filters you can apply, and GNU find is especially full of them. There are alternative ways to specify dates. You can search for particular types of special files.  You can search for certain permissions. And so forth.  But the ones I’ve shown you are the ones I’ve used most often.

But the tests are only the first part of using “find”: Once you’ve gotten a list of files, what can you do with them?

So far, we’ve seen a single action, namely “-print”.  There are a few others that you might find useful.

The first is “-ls”, which runs the Unix “ls” command (with a few options that’ll show size and permissions):

find . -size +500M -size -5G -atime -30 -name "*.mp4" -ls

The above will not only print the filename (like “-print”), but will also show lots of other information about the files we’ve found. What if you want to write this list to a file? Then just use the “-fls” option, and give it a filename:

find . -size +500M -size -5G -atime -30 -name "*.mp4" -fls big-movies.txt

It’s pretty common to want to delete files. So you can use the “-delete” option to do so.  Warning: Running a program that automatically deletes files can be very dangerous. I almost never do this, because I’m always so worried that something will go wrong.  Here’s how I can remove all of the backup files in my Linux /var/log directory that are more than 21 days old:

find . -name '*.gz' -mtime +21 -delete -print

Note that you can have more than one action; in this case, my first action was “-delete”, and my second was “-print”.

It’s pretty common for me to want to search through an entire directory for a file that contains particular text. In other words, I want to run the “grep” utility on each file. I can do that by using the all-purpose “-exec” action.  The basic idea is as follows: You hand “-exec” a command, and the command is then ended with \; (yes, backlash + semicolon). In between, you can write whatever Unix command you want, including options. The current filename can be put into the command with the special formula {} (i.e., empty curly braces).  For example, I can say:

find . -name "*.txt" -exec grep Reuven {} \;

The above will show all lines from all files containing my name. (Of course, a regular expression can be far more complex than this; if you aren’t familiar with grep or regexps, you can take my free “regular expressions crash course.”)    But the output only shows the lines we would get from “grep”, which (by default) doens’t show the name of the current file if you’re running it one file at a time. For this reason, we would be wise to include the “-H” option:

$ find . -name "*.txt" -exec grep -H Reuven {} \;

While “grep” is the most common command that I run via “-exec”, you can use any program you want, including programs that you’ve written.  In this way, you can really make “find” work for you, and execute custom code for each file that fits a criteria. Combine “find” with “cron”, and you have an easy way to identify files that need your attention, or that should be removed, or that you’ve been looking for and otherwise cannot find.

If there’s one drawback to “find”, it’s that the search happens in real time. There is no database through which it runs. Which means that if you’re going through a very large directory structure, you might discover that “find” takes quite a while.

And that’s about it! If you’re like me, then you’ll find (no pun intended) that these use cases cover most of what you need with the “find” utility. The documentation is extremely long, but only because “find” has many other tests and actions that you can mix and match in a variety of ways.

 

 

 

June 8, 2018 , bY reuven
Python

If you have children, then you probably remember them learning to walk, and then to read. If you’re like me, you were probably amazed by how long it took to do things that we don’t even think about. Things that we take for granted in our day-to-day lives, and which seem so obvious to us, take a long time to master.

You can often see and experience this when you compare how you learn a language as a native speaker, from how you learn as a second language. I grew up speaking English, and never learned all sorts of rules that my non-native-speaking friends learned in school. Similarly, I learned all sorts of rules for Hebrew grammar that my children never learned in school.

It’s thus super easy to take things for granted when you’re an expert. Indeed, that’s almost the definition of an expert — someone who understands a subject so well, that for them things are obvious.

To many developers, and especially Python developers, it’s obvious not only that there are different types of parentheses in Python, but that each type has multiple uses, and do completely different things. But to newcomers, it’s far from obvious when to use round parentheses, square brackets, and/or curly braces.

I’ve thus tried to summarize each of these types of parentheses, when we use them, and where you might get a surprise as a result.  If you’re new to Python, then I hope that this will help to give you a clearer picture of what is used when.

I should also note that the large number of parentheses that we use in Python means that using an editor that colorizes both matching and mismatched parentheses can really help. On no small number of occasions, I’ve been able to find bugs quickly thanks to the paren-coloring system in Emacs.

Regular parentheses — ()

Callables (functions and classes)

Perhaps the most obvious use for parentheses in Python is for calling functions and creating new objects. For example:

x = len('abcd')

i = int('12345')

It’s worth considering what happens if you don’t use parentheses. For example, I see the following code all the time in my courses:

d = {'a':1, 'b':2, 'c':3}
for key, value in d.items:
    print(f"{key}: {value}")

When you try to run this code, you can an error message that is true, but whose meaning isn’t completely obvious:

TypeError: 'builtin_function_or_method' object is not iterable

Huh?  What the heck does this mean?

It’s worth remembering how “for” loops work in Python:

  • “for” turns to the object at the end of the line, and asks whether it’s iterable
  • if so, then “for” asks the object for its next value
  • whenever the object says, “no more!” the loop stops

In this case, “for” turns to the method “d.items” and asks if it’s iterable. Note that we’re not asking whether the output from “d.items” is iterable, but rather whether the method itself is iterable.

That’s because there’s a world of difference between “d.items” and “d.items()”. The first returns the method. The second returns an iterable sequence of name-value pairs from the dictionary “d”.

The solution to this non-working code is thus to add parentheses:

d = {'a':1, 'b':2, 'c':3}
for key, value in d.items():
    print(f"{key}: {value}")

Once we do that, we get the desired result.

I should note that we also need to be careful in the other direction: Sometimes, we want to pass a function as an argument, and not execute it. One example is when we’re in the Jupyter notebook (or other interactive Python environment) and ask for help on a function or method:

help(len)

help(str.upper)

In both of the above cases, we don’t want to get help on the output of those functions; rather, we want to get help on the functions themselves.

Prioritizing operations

In elementary school, you probably learned the basic order of arithmetic operations — that first we multiply and divide, and only after do we add and subtract.

Python clearly went to elementary school as well, because it follows this order.  For example:

In [1]: 2 + 3 * 4
Out[1]: 14

We can change the priority by using round parentheses:

In [2]: (2 + 3) * 4
Out[2]: 20

Experienced developers often forget that we can use parentheses in this way, as well — but this is, in many ways, the most obvious and natural way for them to be used by new developers.


Creating tuples

Of course, we can also use () to create tuples. For example:

In [8]: t = (10,20,30)

In [9]: type(t)
Out[9]: tuple

What many beginning Python developers don’t know is that you actually don’t need the parentheses to create the tuple:

In [6]: t = 10,20,30

In [7]: type(t)
Out[7]: tuple

Which means that when you return multiple values from a function, you’re actually returning a tuple:

In [3]: def foo():
...: return 10, 20, 30
...:
...:

In [4]: x = foo()

In [5]: x
Out[5]: (10, 20, 30)

What surprises many newcomers to Python is the following:

In [10]: t = (10)

In [11]: type(t)
Out[11]: int

“Wait,” they say, “I used parentheses. Shouldn’t t be a tuple?”

No, t is an integer.  When Python’s parser sees something like “t = (10)”, it can’t know that we’re talking about a tuple.  Otherwise, it would also have to parse “t = (8+2)” as a tuple, which we clearly don’t want to happen, assuming that we want to use parentheses for prioritizing operations (see above).  And so, if you want to define a one-element tuple, you must use a comma:

In [12]: t = (10,)

In [13]: type(t)
Out[13]: tuple

Generator expressions

Finally, we can use round parentheses to create “generators,” using what are known as “generator expressions.” These are a somewhat advanced topic, requiring knowledge of both comprehensions and iterators. But they’re a really useful tool, allowing us to describe a sequence of data without actually creating each element of that sequence until it’s needed.

For example, if I say:

In [17]: g = (one_number * one_number
...: for one_number in range(10))

The above code defines “g” to be a generator, the result of executing our generator expression. “g” is than an iterable, an object that can be placed inside of a “for” loop or a similar context.   The fact that it’s a generator means that we can have a potentially infinite sequence of data without actually needing to install an infinite amount of RAM on our computers; so long as we can retrieve items from our iterable one at a time, we’re set.

The above generator “g” doesn’t actually return 10 numbers. Rather, it returns one number at a time. We can retrieve them all at once by wrapping it in a call to “list”:

In [18]: list(g) 
Out[18]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

But the whole point of a generator is that you don’t want to do that. Rather, you will get each element, one at a time, and thus reduce memory use.

Something funny happens with round parentheses when they’re used on a generator expression in a function call.  Let’s say I want to get a string containing the elements of a list of integers:

In [19]: mylist = [10, 20, 30]

In [20]: '*'.join(mylist)

This fails, because the elements of “mylist” are integers.  We can use a generator expression to turn each integer into a string:

In [21]: '*'.join((str(x)
...: for x in mylist))
Out[21]: '10*20*30'

Notice the double parentheses here; the outer ones are for the call to str.join, and the inner ones are for the generator expression. Well, it turns out that we can remove the inner set:

In [22]: '*'.join(str(x)
...: for x in mylist)
Out[22]: '10*20*30'

So the next time you see a call to a function, and a comprehension-looking thing inside of the parentheses, you’ll know that it’s a generator expression, rather than an error.

Cheating Python’s indentation rules

Python is famous for its use of indentation to mark off blocks of code, rather than curly braces, begin/end, or the like. In my experience, using indentation has numerous advantages, but tends to shock people who are new to the language, and who are somewhat offended that the language would dictate how and when to indent code.

There are, however, a few ways to cheat (at least a little) when it comes to these indentation rules. For example, let’s say I have a dictionary representing a person, and I want to know if the letter ‘e’ is in any of the values.  I can do something like this:

In [39]: person = {'first':'Reuven', 'last':'Lerner', 'email':'reuven@lerner.co.il'}

In [40]: if 'e' in person['first'] or 'e' in person['last'] or 'e' in person['email']:
...: print("Found it!")
...:
Found it!

That “if” line works, but it’s far too long to be reasonably readable.  What I’d love to do is this:

In [40]: if 'e' in person['first'] or 
            'e' in person['last'] or 
            'e' in person['email']:
...: print("Found it!")

The problem is that the above code won’t work; Python will get to the end of the first “or” and complain that it reached the end of the line (EOL) without a complete statement.

The solution is to use parentheses. That’s because once you’ve opened parentheses, Python is much more forgiving and flexible regarding indentation. For example, I can write:

In [41]: if ('e' in person['first'] or
...:         'e' in person['last'] or
...:         'e' in person['email']):
...:         print("Found it!")
...:
...:
Found it!

Our code is now (in my mind) far more readable, thanks to the otherwise useless parentheses that I’ve added.

By the way, this is true for all parentheses. So if I want to define my dict on more than one line, I can say:

In [42]: person = {'first':'Reuven',
...:               'last':'Lerner',
...:               'email':'reuven@lerner.co.il'}

Python sees the opening { and is forgiving until it finds the matching }. In the same way, we can open a list comprehension on one line and close it on another.  For years, I’ve written my list comprehensions on more than one line, in the belief that they’re easier to read, write, and understand. For example:

[one_number * one_number

for one_number in range(10)]

Square brackets — []

Creating lists

We can create lists with square brackets, as follows:

mylist = [ ]  # empty list

mylist = [10, 20, 30]  # list with three items

Note that according to PEP 8, you should write an empty list as [], without any space between the brackets.  I’ve found that with certain fonts, the two brackets end up looking like a square, and are hard for people in my courses to read and understand.  So I always put a space between the brackets when creating an empty list.

(And yes, I’m that rebellious in real life, not just when programming.)

We can use square brackets not just to create lists with explicitly named elements, but also to create lists via list comprehensions:

In [16]: [one_number * one_number
...: for one_number in range(10)]
...:
Out[16]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The square brackets tell Python that this is a list comprehension, producing a list.  If you use curly braces, you’ll get either a set or a dict back, and if you use regular parentheses, you’ll get a generator expression (see above).

Requesting individual items

Many people are surprised to discover that in Python, we always use square brackets to retrieve from a sequence or dictionary:

In [23]: mylist = [10, 20,30]

In [24]: t = (10, 20, 30)

In [25]: d = {'a':1, 'b':2, 'c':3}

In [26]: mylist[0]
Out[26]: 10

In [27]: t[1]
Out[27]: 20

In [28]: d['c']
Out[28]: 3

Why don’t we use regular parentheses with tuples and curly braces with dictionaries, when we want to retrieve an element? The simple answer is that square brackets, when used in this way, invoke a method — the __getitem__ method.

That’s right, there’s no difference between these two lines of code:

In [29]: d['c']
Out[29]: 3

In [30]: d.__getitem__('c')
Out[30]: 3

This means that if you define a new class, and you want instances of this class to be able to use square brackets, you just need to define __getitem__.  For example:

In [31]: class Foo(object):
...:         def __init__(self, x):
...:             self.x = x
...:         def __getitem__(self, index):
...:             return self.x[index]
...:

In [32]: f = Foo('abcd')

In [33]: f[2]
Out[33]: 'c'

See?  When we say f[2], that’s translated into f.__getitem__(2), which then returns “self.x[index]”.

The fact that square brackets are so generalized in this way means that Python can take advantage of them, even on user-created objects.

Requesting slices

You might also be familiar with slices. Slices are similar to individual indexes, except that they describe a range of indexes. For example:

In [46]: import string

In [47]: string.ascii_lowercase[10:20]
Out[47]: 'klmnopqrst'

In [48]: string.ascii_lowercase[10:20:3]
Out[48]: 'knqt'

As you can see, slices are either of the form [start:end+1] or [start:end+1:stepsize].  (If you don’t specify the stepsize, then it defaults to 1.)

Here’s a little tidbit that took me a long time to discover: You can get an IndexError exception if you ask for a single index beyond the boundaries of a sequence. But slices don’t have in such problems; they’ll just stop at the start or end of your string:

In [50]: string.ascii_lowercase[500]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-50-fad7a1a4ec3e> in <module>()
----> 1 string.ascii_lowercase[500]

IndexError: string index out of range


In [51]: string.ascii_lowercase[:500]
Out[51]: 'abcdefghijklmnopqrstuvwxyz'

How do the square brackets distinguish between an individual index and a slice? The answer: They don’t. In both cases, the __getitem__ method is being invoked. It’s up to __getitem__ to check to see what kind of value it got for the “index” parameter.

But wait: If we pass an integer or string (or even a tuple) to square brackets, we know what type will be passed along. What type is passed to our method if we use a slice?

In [55]: class Foo(object):
...:         def __getitem__(self, index):
...:             print(f"index = {index}, type(index) = {type(index)}")
...:


In [56]: f = Foo()

In [57]: f[100]
index = 100, type(index) = <class 'int'>

In [58]: f[5:100]
index = slice(5, 100, None), type(index) = <class 'slice'>

In [59]: f[5:100:3]
index = slice(5, 100, 3), type(index) = <class 'slice'>

Notice that in the first case, as expected, we get an integer. But in the second and third cases, we get a slice object. We can create these manually, if we want; “slice” is in the “bulitin” namespace, along with str, int, dict, and other favorites. And as you can see from its printed representation, we can call “slice” much as we do “range”, with start, stop, and step-size arguments.  I haven’t often needed or wanted to create slice objects, but you certainly could:

In [60]: s = slice(5,20,4)

In [61]: string.ascii_lowercase[s]
Out[61]: 'fjnr'

In [62]: string.ascii_uppercase[s]
Out[62]: 'FJNR'

In [63]: string.ascii_letters[s]
Out[63]: 'fjnr'

In [64]: string.punctuation[s]
Out[64]: '&*.<'

Curly braces — {}

Creating dicts

The classic way to create dictionaries (dicts) in Python is with curly braces. You can create an empty dict with an empty pair of curly braces:

In [65]: d = {}

In [66]: len(d)
Out[66]: 0

Or you can pre-populate a dict with some key-value pairs:

In [67]: d = {‘a’:1, ‘b’:2, ‘c’:3}

In [68]: len(d)
Out[68]: 3

You can, of course, create dicts in a few other ways. In particular, you can use the “dict” class to create a dictionary based on a sequence of two-element sequences:

In [69]: dict(['ab', 'cd', 'ef'])
Out[69]: {'a': 'b', 'c': 'd', 'e': 'f'}

In [70]: d = dict([('a', 1), ('b', 2), ('c', 3)])

In [71]: d
Out[71]: {'a': 1, 'b': 2, 'c': 3}

In [72]: d = dict(['ab', 'cd', 'ef'])

In [73]: d
Out[73]: {'a': 'b', 'c': 'd', 'e': 'f'}

But unless you need to create a dict programmatically, I’d say that {} is the best and clearest way to go. I remember reading someone’s blog post a few years ago (which I cannot find right now) in which it was found that {} is faster than calling “dict” — which makes sense, since {} is part of Python’s syntax, and doesn’t require a function call.

Of course, {} can also be used to create a dictionary via a dict comprehension:

In [74]: { one_number : one_number*one_number
...:       for one_number in range(10) }
...:
Out[74]: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

In the above code, we create a dict of the number 0-9 (keys) and their values to the second power (values).

Remember that a dict comprehension creates one dictionary, rather than a list containing many dictionaries.

Create sets

I’ve become quite the fan of Python’s sets. You can think of sets in a technical sense, namely that they are mutable, and contain unique, hashable values. As of Python 3.6, they are stored in insertion order.

But really, it’s just easiest to think of sets as dictionaries without any values. (Yes, this means that sets are nothing more than immoral dictionaries.)  Whatever applies to dict keys also applies to the elements of a set.

We can create a set with curly braces:

In [75]: s = {10,20,30}

In [76]: type(s)
Out[76]: set

As you can see, the fact that there is no colon (:) between the name-value pairs allows Python to parse this code correctly, defining ‘s” to be a set, rather than a dict.

Nearly every time I teach about sets, someone tries to create an empty set and add to it, using set.add:

In [77]: s = {}

In [78]: s.add(10)
—————————————————————————
AttributeError Traceback (most recent call last)
<ipython-input-78-721f80ddfefc> in <module>()
—-> 1 s.add(10)

AttributeError: ‘dict’ object has no attribute ‘add’

The error indicates that “s” is a dict, and that dicts lack the “add” method. Which is fine, but didn’t I define “s” to be a set?

Not really: Dicts came first, and thus {} is an empty dict, not an empty set. If you want to create an empty set, you’ll need to use the “set” class:

s = set()

s.add(10)

This works just fine, but is a bit confusing to people starting off in Python.

I often use sets to remove duplicate entries from a list. I can do this with the “set” class (callable), but I can also use the “* argument” syntax when calling a function:

In [79]: mylist = [10, 20, 30, 10, 20, 30, 40]

In [80]: s = {*mylist}

In [81]: s
Out[81]: {10, 20, 30, 40}

Note that there’s a bit difference between {*mylist} (which creates a set from the elements of mylist) and {mylist} which will try to create a set with one element, the list “mylist”, and will fail because lists are unhashable.

Just as we have list comprehensions and dict comprehensions, we also have set comprehensions, which means that we can also say:

In [84]: mylist = [10, 20, 30, 10, 20, 30, 40]

In [85]: {one_number
...: for one_number in mylist}
...:
Out[85]: {10, 20, 30, 40}

str.format

Another place where we can use curly braces is in string formatting. Whereas Python developers used to use the printf-style “%” operator to create new strings, the modern way to do so (until f-strings, see below) was the str.format method. It worked like this:

In [86]: name = ‘Reuven’

In [87]: “Hello, {0}”.format(name)
Out[87]: ‘Hello, Reuven’

Notice that str.format returns a new string; it doesn’t technically have anything to do with “print”, although they are often used together. You can assign the resulting string to a new variable, write it to a file, or (of course) print it to the screen.

str.format looks inside of the string, searching for curly braces with a number inside of them. It then grabs the argument with that index, and interpolates it into the resulting string.  For example:

In [88]: 'First is {0}, then is {1}, finally is {2}'.format(10, 20, 30)
Out[88]: 'First is 10, then is 20, finally is 30'

You can, of course, mix things up:

In [89]: 'First is {0}, finally is {2}, then is {1}'.format(10, 20, 30)
Out[89]: 'First is 10, finally is 30, then is 20'

You can also repeat values:

In [90]: 'First is {0}. Really, first is {0}. Then is {1}'.format(10, 20, 30)
Out[90]: 'First is 10. Really, first is 10. Then is 20'

If you’ll be using each argument once and in order, you can even remove the numbers — although I’ve been told that this makes the code hard to read.  And besides, it means you cannot repeat values, which is sometimes annoying:

In [91]: 'First is {}, then is {}, finally is {}'.format(10, 20, 30)
Out[91]: 'First is 10, then is 20, finally is 30'

You cannot switch from automatic to manual numbering in curly braces (or back):

In [92]: 'First is {0}, then is {}, finally is {}'.format(10, 20, 30)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-92-f00f3adf93eb> in <module>()
----> 1 'First is {0}, then is {}, finally is {}'.format(10, 20, 30)

ValueError: cannot switch from manual field specification to automatic field numbering

str.format also lets you use names instead of values, by passing keyword arguments (i.e., name-value pairs in the format of key=value):

In [93]: 'First is {x}, then is {y}, finally is {z}'.format(x=10, y=20, z=30)
Out[93]: 'First is 10, then is 20, finally is 30'

You can mix positional and keyword arguments, but I beg that you not do that:

In [94]: 'First is {0}, then is {y}, finally is {z}'.format(10, y=20, z=30)
Out[94]: 'First is 10, then is 20, finally is 30'

f-strings

As of Python 3.6, we have an even more modern way to perform string interpolation, using “f-strings”. Putting “f” before the opening quotes allows us to use curly braces to interpolate just about any Python expression we want — from variable names to operations to function/method calls — inside of a string:

In [99]: name = 'Reuven'

In [100]: f"Hello, {name}"
Out[100]: 'Hello, Reuven'

In [101]: f"Hello, {name.upper()}"
Out[101]: 'Hello, REUVEN'

In [102]: f"Hello, {name.split('e')}"
Out[102]: "Hello, ['R', 'uv', 'n']"

I love f-strings, and have started to use them in all of my code.  bash, Perl, Ruby, and PHP have had this capability for years; I’m delighted to (finally) have it in Python, too!

from __future__ import braces

Do you sometimes wish that you could use curly braces instead of indentation in Python? Yeah, you’re not alone.  Fortunately, the __future__ module is Python’s way of letting you try new features before they’re completely baked into your current Python version. For example, if you’re still using Python 2.7 (and I hope you’re not), you can say

from __future__ import division

and division will always return a float, rather than an integer, even if the two operands are integers.

So go ahead, try:

from __future__ import braces

(Yes, this is part of Python, albeit a silly part.  And no, don’t expect to be able to use curly braces instead of indentation any time soon.)

May 11, 2018 , bY reuven
Python

Hi!  I’m in Cleveland, Ohio, for PyCon 2018.  I’ve already heard back from others who are here for the conference, and will be coordinating with all of you privately to meet up in person.  I’m excited to meet (and learn from) lots of other Python developers from around the world!

To celebrate the conference, I’m offering 20% off of all my books and courses.  Just use the coupon code PYCON2018 at checkout from my online store, or click on any of the following links:

This coupon code will only work through the end of the conference, on Sunday night.

May 7, 2018 , bY reuven
Python

I’ll be attending PyCon 2018 in Cleveland, Ohio later this week — from Friday morning through the first day of sprints on Monday.

I’m hoping to meet lots of people from the worldwide Python community — as well as readers of Linux Journal, listeners to the “Freelancers Show” podcast, people who have attended my training courses over the years, and subscribers to my “Better developers” newsletter.

I’m planning to host a get-together, hopefully on Friday; I’ll post to my blog and Twitter account once I know when and where.  I’m already planning to host an “open space” for Python trainers on Friday.

It’s my first PyCon, so I’m both excited and nervous about how overwhelming it’ll likely be.  But I’m mainly interested in meeting people — so don’t be shy, and please do make an effort to find me!  I’ll be delighted to meet you and chat about all things Python (and not).  You can contact me on Twitter as @reuvenmlerner and WeChat as ReuvenLerner.

See you there!

May 4, 2018 , bY reuven
Python

I’ve been teaching Python for many years now, and it never ceases to amaze me to discover just how many people are able to use Python without ever having really learned it.

In particular, there are lots of people writing and using Python objects who assume that classes and instances in Python work just like those in Java and C#, but with less syntax.

For the most part, they can make those assumptions and not suffer too much.  But then, if something goes wrong… well, they’re at a loss to explain how and why things didn’t work.

And then little things nag them: Why does “self” need to be the first parameter in a method? And when we set a variable inside of a class definition, are we setting the default for the instances? (Answer: No, because you’re setting a class attribute.)  How does inheritance really work?  And how does super() work in Python?

Knowing the answers to these and other questions are the key to becoming a more fluent Python developer. That’s important, because fluency will allow you to do more in less time, to write more expressive and powerful code, and to write code that can be changed and maintained more easily.

I’ve taken my lessons and exercises from my intro Python class and turned them into a new online class, “Object oriented Python.”  The course assumes that you’re familiar with basic Python data structures and functions, but otherwise starts with the basics of object-oriented programming, through inheritance and operator overloading.

In more than 4 hours of video lectures, and with 15 exercises, I walk you through how Python’s object system works, and how you can make it work for you.

This course is for you if:

  • You’re a Python developer unfamiliar with objects
  • You’ve been cutting, pasting, and editing Python code for years from Stack Overflow and blogs, without really understanding how objects work
  • You’re new to Python, but experienced with objects in another language, such as C#, C++, or Java

This is the same material I teach at companies like Apple, Cisco, IBM, Intel, PayPal, Western Digital, and VMWare.  I’m booked solid 8 months in advance with training at these companies  — but you can get these lessons, and improve your Python coding, in the coming minutes.

The course includes:

  • More than four hours of lecture/screencasts
  • 15 exercises, including extensive walk-throughs of the solutions
  • PDFs of the slides I use when teaching about objects
  • The Jupyter notebook I used when live-coding the course
  • Any and all updates I make to this course in the future, via my “forever free” policy

Also: You get access to my online office hours, given every 3-4 weeks, online, to students in all of my classes.  Come and ask your Python questions!  Or submit them, and I’ll answer them for you in video.

Not sure?  Go to the course page; a number of the lessons are available for free preview.

Click here to order “Object-oriented Python”

This course, like all of my others, comes with a 100% money-back guarantee.  I also offer discounts for students and pensioners, group discounts, and parity pricing for people living in any country outside the 30 wealthiest in the world.  Just e-mail me to get the appropriate coupon codes.

If you’re a Python developer and have been unsure about how to use Python objects, this is your chance.  Order “Object oriented Python,” and become a more fluent Python developer today.

Still not sure?  Send me e-mail, at reuven@lerner.co.il, and let me know.  I’ll be delighted to answer your questions.

May 4, 2018 , bY reuven
Business, Python, Training

As you may know, I’ve been a panelist on the Freelancers Show podcast for a few years.  It’s one of the high points of my week to chat with my co-panelists, discuss various aspects of freelancing/consulting, and to interview interesting people who can help us improve our freelancing careers.

I’ve been consulting since 1995, but I think that my business has improved greatly thanks to the advice I’ve gotten from the show, and from the discussions we’ve had.

This coming Tuesday, we’ll be recording our 300th show. To celebrate, we’re making it a live webinar, in which we’ll take questions from … well, anyone who wants to ask!

(The show will also be recorded and available as usual.)

So, if you’re a freelancer/consultant, or if you’re thinking about taking the plunge, or just curious, or just want to see what I look like — well, join us!  You can sign up at the Crowdcast site here:

https://www.crowdcast.io/e/tfs300

If you can’t make it, then we would still love to get your questions.  Write them on the Crowdcast page, or leave them as comments here.  I promise we’ll try to answer them all!

April 19, 2018 , bY reuven
Python, Training

What’s the hardest part of Python to understand?

For nearly 20 years, I’ve been teaching Python to engineers at companies around the world. And if I had to say what most confuses my students, it’s list comprehensions.

And yes, if you’re wondering, set and dict comprehensions are equally confusing.

Comprehensions are both powerful and compact. The syntax, however, is far from obvious. Moreover, it’s not always clear just when or why you should use comprehensions, and when you should instead use a regular “for” loop.

Experienced Python developers know that there is a difference between comprehensions and “for” loops, even if it isn’t obvious to newcomers. Those experts also know that comprehensions are an essential part of a Python developer’s toolbox. It’s a rare day on which I won’t use a comprehension in my Python programming.

Among other things, comprehensions can:

  • help you to convert sequences from one type to another
  • make it easy to extract information from logfiles
  • rearrange data inside of complex data structures
  • create lists, sets, and dicts from files

So, how can we help newcomers, who see comprehensions as some combination of weird, unnecessary, and hard to understand? If you know me, then you already know the answer: Clear explanations, followed by lots of practice.

I’m thus delighted to announce the launch of “Comprehending comprehensions,” an online course that teaches Python developers how and why to use comprehensions.

This is an Internet version of a class which I have taught more than 100 times at companies around the world. Through nearly two hours of video and more than 15 exercises, you’ll learn how, when, and why to write and use comprehensions.

If you’re familiar with Python’s basic data types (strings, lists, tuples, dicts, and sets), reading from files and writing simple functions, but don’t yet feel comfortable using comprehensions, then this course is for you: You’ll come out of the course knowing how to write faster, cleaner, and more robust Python code. You’ll know how to handle common situations with data structures and files. And you’ll be better prepared to read and debug code written by others.

In addition to the videos and exercises, this course comes with the slides I use at my in-person training, and the input files you’ll need to solve the exercises.

My goal, as always, is to make you a more fluent Python programmer. If you take this course, you will have made a major step forward on that journey.

As always, I offer discounts to students, pensioners, and people living outside of the 30 wealthiest countries in the world — just e-mail me to request an appropriate coupon code. There are also group discounts, if you want to buy five or more copies for your organization.

Click here to sign up for “Comprehending Comprehensions”!

Questions? Comments? Just leave a comment on this blog or send me e-mail; I’ll respond with an answer right away.

March 29, 2018 , bY reuven
Education, Open source, Python

Ah, Git.  It’s one of the best and most important tools I use as a software developer.   Git is everything I want in a version-control system: It’s fast. It lets me collaborate. I can work without an Internet connection.  I can branch and merge easily, using a variety of techniques.  I can take a personal project and turn it into a large, collaborative one with minimal effort. And it’s cross platform, meaning that I know my clients and colleagues will be able to use it.

So, what’s the problem?  Git’s learning curve is extremely steep.  Until you understand what Git is doing, and how it works, you cannot use it effectively.  Moreover, until you understand what Git is doing, you will likely be puzzled and frustrated by its commands, messages, and documentation.

I’m thus delighted to unveil my latest online course: Understanding and mastering Git.  This course, which I have taught to numerous companies all around the world for more than a decade, includes:

  • Nearly 80 video lectures, for a total of more than 7 hours of videos (preview a number of videos here, on the course page)
  • Dozens of exercises, to help you practice and understand working with Git
  • 11 slide decks (in PDF format), the same ones I use when teaching.

If you have been frustrated by Git, or consider the commands you’ve been using to be a form of black magic, then this course is for you.  It walks you through Git’s commands, objects, and methods for collaboration.

This course has been battle-tested for a decade at some of the world’s best-known companies.  If you want to get the most out of Git, I’m sure that my course will help.  And if it doesn’t?  E-mail me, and I’ll give you a 100% refund.

Don’t let Git frustrate you any more.  Understand it.  Master it.  Tame it.   Learn from my “Understanding and mastering Git” course, today:

http://store.lerner.co.il/understanding-and-mastering-git

Not sure if this course is for you?  That’s fine: You can preview a number of the course videos for free from the course sales page.

Also: If you’re a student or pensioner, then you qualify for a discount on the course.  Just e-mail me, and I’ll send you a special discount code.

And finally: If you live in a country outside of the top 30 per-capita GDP countries in the world, then e-mail me and I’ll send you a special discount code to make the course more affordable to you.

You can and should learn Git, and I want to help you to learn it.  Try my course, and discover why so many software engineers won’t even think about using something else.

March 20, 2018 , bY reuven
Python

Within minutes of starting to learn Python, everyone learns how to define a variable. You can say:

x = 100

and voila!  Your You have created a variable “x”, and assigned the integer value 100 to it.  It couldn’t be simpler than that.

But guess what?  This isn’t the only way to define variables, let alone assign to them, in Python. And by understanding the other ways that we can define variables, and the nuances of the “simple” variable assignment that I described above, you can get a better appreciation for Python’s consistency, along with the idea that “everything is an object.”

Method 1: Plain ol’ assignment

The first way to define a variable is with the assignment operator, =.  Once again, I can say

x = 100

and I have assigned to x. If x didn’t exist before, then it does now. If it did exist before, then x now points to a new and different object.

What if the new, different object is also of a new and different type? We don’t really care; the nature of a dynamic language is such that any variable can point to an object of any type. When we say “type(x)” in our code, we’re not really asking what type of object the variable “x” can hold; rather, we’re asking what type of variable “x” is pointing to right now.

One of the major misconceptions that I come across in my Python courses and newsletter is just how Python allocates and stores data.  Many of my students come from a C or C++ background, and are thus used to the “box model” of variable assignment: When we declare the variable “x”, we have to declare it with a type.  The language then allocates enough memory to store that type, and gives that memory storage an alias, “x”.  When I say “x=100”, the language puts the value 100 inside of the box named “x”.  If the box isn’t big enough, bad news!

In Python, though, we don’t use the box model of variable assignment. Rather, we have the dictionary model of variable assignment: When we assign “x=100”, we’re creating (or updating) a key-value pair in a dictionary. The key’s name is “x”, and the value is 100, or anything else at all. Just as we don’t care what type of data is stored in a dictionary’s values, we also don’t care what type of value is stored inside of a Python variable’s value.

(I know, it’s a bit mind-bending to say that Python variables are stored in a dictionary, when dictionaries are themselves Python values, and are stored in Python variables. It’s turtles all the way down, as they say.)

Don’t believe me?  Just run the “globals” function in Python. You’ll get a dictionary back, in which the keys are all of the global variables you’ve defined, and the values are all of the values of those variables.  For example:

>>> x = 100

>>> y = [10, 20, 30]

>>> globals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'x': 100, 'y': [10, 20, 30]}

>>> x = 200

>>> y = {100, 200, 300}

>>> globals()

{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'x': 200, 'y': {200, 100, 300}}

But wait, we can do even better:

>>> globals()['x'] = 300
>>> x
300
>>> globals()['y'] = {'a':1, 'b':2}
>>> y
{'a': 1, 'b': 2}

That’s right; the result of invoking “globals” is not only a dictionary showing us all of the global variables, but is something we can modify — and whose modifications not only reflect, but also affect, our global variables.

Now, you might be thinking, “But not all Python variables are globals.” That’s true; Python’s scoping rules describe four levels: Local, Enclosing, Global, and Builtins.  Normally, assignment creates/updates either a global variable or a local one.  How does Python know which one to use?

The answer is pretty simple, actually: When you assign inside of a function, you’re working with a local variable. When you assign outside of a function, you’re working with a global variable.  It’s pretty much as simple as that.  Sure, there are some exceptions, such as when you use the “global” or “nonlocal” keyword to force Python’s hand, and create/update a variable in a non-local scope when you’re within a function.   But the overwhelming majority of the time, you can assume that assignment within a function creates or updates a local variable, and outside of a function creates or updates a global variable.

Note that this means assignment inside of a “for” loop or “if” statement doesn’t create a local variable; it creates a global one.  It’s common for my students to believe that because they are inside of an indented block, the variable they are creating is not global.  Not true!

If you’re in a function and want to see the variables that have been created, you can use the “locals” function, which operates just like the “globals” function we looked at earlier.  Even better, you can use the “vars” function, which invokes “locals” inside of a function and “globals” outside of a function.

So the first type of variable assignment in Python is also the most common, and can be subdivided into two basic categories, local and global.

I should add that I’m only talking here about variable assignment, not attribute assignment. Attributes (i.e., anything that comes after a “.” character, as “b” in the expression “a.b”) are a totally different kettle of fish, and have their own rules and quirks.

Method 2: def

If you want to create a function in Python, you use the “def” keyword, as in:

>> def hello(name):
...     return f"Hello, {name}"    # I love f-strings!

While we often think that “def” simply defines a function, I think it’s easier to think of “def” as actually doing two things:

  • Creating a new function object
  • Assigning that function object to the name immediately following the “def”

Thinking about “def” this way, as object creation + assignment, makes it easier to understand a variety of different situations that Python developers encounter.

First of all, “def” defines variables in the same scope as a simple variable assignment would. So if you’re in the global scope, you’re creating a global variable. Remember that Python doesn’t have separate namespaces for data and functions. This means that if you’re not careful, you can accidentally destroy your function or variable definition:

>> x = 100

>>> def x():
...     return "Hello!"

>>> print(x*2)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for *: 'function' and 'int'

Also:

> def x():
... return "Hello!"

>>> x = 100

>>> print(x())
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable

In both of these cases, the programmer assumed that setting a variable and defining a function wouldn’t interfere with one another, with clearly disastrous results.

The fact that “def” assigns to a variable also explains why you cannot define a function multiple times in Python, each with its own function signature (i.e., argument count and types). Some people, for example, expect to be able to do the following:

>> def hello(name):
...     return f"Hello, {name}"

>>> def hello(first, last):
...     return f"Hello, {first} {last}"

They then assume that we can invoke “hello” with either one argument (and invoke the first version) or two arguments (and invoke the second).  However, this isn’t true; the fact that “def” defines a variable means that the second “def” obliterates whatever you did with the first.  Consider this code, in which I define “x” twice in a row with two different values:

>>> x = 100
>>> x = 200

You cannot realistically think that Python will know which “x” to choose according to context, right?  In the same way, Python cannot choose a function definition for you.  It looks up the function’s name in “globals()”, gets the function object from that dictionary, and then invokes it (thanks to the parentheses).

A final aspect of seeing “def” as variable assignment has to do with inner functions — that is, functions defined within other functions.  For example:

>> def foo():
...    def bar():
...        print("I'm in bar!")
...    return bar

What’s happening here?  Well, let’s assume that we execute function “foo”.    Now we’re inside of a function, which means that every time we define a variable, it’ll be a local variable, rather than a global one. The next thing we hit is “def”, which defines a variable.  Well, that means “bar” is a local variable inside of the “foo” function.

Then, in the next line of “foo”, after defining “bar”, we return a local variable to whoever called “foo”.  We can return any kind of local variable we like from a function; in this particular case, though, we’re returning a function object.

This just scratches the surface of inner functions, but the logic — we’re defining and then returning a local variable in “foo” — is consistent, and should make more sense if you think of “def” as just assigning variables.

Method 3: import

One of the most powerful aspects of Python is its very mature standard library. When you download and install Python, you have hundreds (thousands?) of modules available to you and your programs, just by using the “import” statement.

Well, guess what? “import” is defining a variable, too. Indeed, it often helps to think about “import” as defining a single variable, namely the one whose name you give when you invoke it.  For example:

>> import os

When I invoke the above, I am doing three things:

  • Python finds the module file os.py (or some variation thereof) and executes it
  • Python defines the variable “os” in the current scope
  • All of the global variables defined in the module are turned into attributes on the “os” variable.

It’s pretty easy to see that “import” defines a variable:

>> type(os)
<class 'module'>

What’s a bit harder to understand is the idea that the global variables defined inside of our module all become attributes on the module object. For example, in “os.py”, there is a top-level definition of the “walk” function. Inside of the module, it’s a global variable (function).  But to whoever imports “os”, “walk” isn’t a variable.  Rather, it’s an attribute, “os.walk”.  That’s why if we want to invoke it, we need to use the complete name, “os.walk”.

That’s fine, and works pretty well overall. But if you’re going to be using “os.walk” a lot, then you might not want the overhead of saying “os.walk” each and every time. Instead, you might want to create a global variable whose value is the same as “os.walk”.  You might even say something like this:

walk = os.walk

Since we’re assigning a variable, and since we’re not inside of a function, we’re creating a global variable here.  And since “os.walk” is already defined, we’re simply adding a new reference (name) to “os.walk”.

A faster, easier, and more Pythonic way to do this is with:

from os import walk

Although to be honest, this isn’t quite the same as what I did before. That’s because “from os import walk” does find the “os.py” module file, and does execute its contents — but it doesn’t define “os” as a variable in our global namespace.  Rather, it only creates “walk” as a variable, pointing to the value of what the module knows as “os.walk”.

Does that mean that “from … import …” doesn’t actually load the module? That would be pretty silly, in that future imports of “os” would then be less effective and efficient.

Python is actually pretty clever in this case: When it executes the module file, it creates the module object in sys.modules — a dictionary whose keys are the names of the modules we have loaded.  That’s how “import” knows to import a file only once; it can run “in” on sys.modules, check to see if a module has already been loaded, and only actually import it if the module is missing from that dict.

What if the module has been loaded?  Then “import” still defines the variable in the current namespace.  Thus, if your project has “import os” in 10 different files, then only the first invocation of “import os” actually loads “os.py”.  The nine following times, you just get a variable definition that points to the module object.

In the case of “from-import”, the same thing happens, except that instead of assigning the module’s name as a variable in the current namespace, you get the name(s) you explicitly asked for.

In theory, you can use “import” and “from-import” inside of a function, in which case you’ll define a local variable.  I’m sure that there is some use for doing so, but I can’t think of what it would be.  However, here Python is also remarkably consistent, allowing you to do things that wouldn’t necessarily be useful.

In Python 2.7, there were two different implementations of the “pickle” module, one written in Python (“pickle”) and one written in C (“cPickle”).  The latter executed much faster, but wasn’t available on all platforms.  It was often recommended that programmers do this, to get the best possible version:

try:
    import cPickle as pickle
except ImportError:
    import pickle

>>> pickle
<module 'cPickle' from '/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/cPickle.so'>

This technique (which isn’t necessary with Python 3) only worked because “import” is executed at runtime, along with other Python code, and because “import” defines a variable.  What the above code basically says is, “Try to load cPickle but if you succeed, use the name pickle.   If you can’t do that, then just use the regular ol’ pickle library.”

Method 4: class

The fourth and final way to define a variable in Python is with the “class” statement, which we use to create (big surprise) classes.  We can create a class as follows:

class Foo(object):
    def __init__(self, x):
        self.x = x

Newcomers to Python often think of classes as blueprints, or plans, or descriptions, of the objects that will be created. But they aren’t that at all — classes in Python are objects, just like anything else.  And since they’re objects, they have both a type (which we can see with the “type” function) and attributes (which we can see with the “dir” function):

>> type(Foo)
<class 'type'>

>>> dir(Foo)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']

The “class” keyword defines a new variable — Foo, in this case — which is assigned to a class object. A class object is just like everything else, except that it is callable, meaning that we can execute it (with parentheses) and get a new object back:

>> Foo(10)
<__main__.Foo object at 0x10da5fc18>

Defining a class means that we’re defining a variable — which means that if we have a previous class of the same name, we’re going to overwrite it, just as was the case with functions.

Class objects, like nearly all objects in Python, can have attributes added to them at runtime.  We can thus say:

>> Foo.abc = 100

And now I’ve added the “abc” attribute to the “Foo” object, which is a class.  I can also do that inside of a class definition:

>>> class Foo(object):
...     abc = 100
...     def __init__(self, x):
...         self.x = x

>>> Foo.abc
100

How is this possible?  Haven’t we, in our class definition, created a variable “abc”?  Nope — it looks like a variable, but it’s actually an attribute on the “Foo” class.  And it must be an attribute, not only because we can retrieve it with “Foo.abc” later, but because all assignments inside of a class definition aren’t creating variables, but rather attributes.  Inside of “class Foo”, we’re thus creating two attributes on the class — not only “abc”, but also “__init__”, the method.

Does this seem familiar, the idea global variables we define in one context are seen as attributes in another context?  Yes, that’s right — we saw the same thing with modules.  Global variables defined in a module are seen, once the module is imported, as attributes on that module.

You can thus think of classes, in some ways, as modules that don’t require a separate file. There are other issues as well, such as the difference between methods and functions — but in general, understanding that whatever you do inside of a class definition is treated as a  module can help to improve your understanding.

I don’t do this very often, but what does it mean, then, if I define a class within another class?  In some languages, such “inner classes” are private, and only available for use within the outer class. Not so in Python; since “class” defines a variable and any variable assignments within a class actually create attributes, an inner class is available to us via the outer class’s attribute:

>>> class Foo(object):
...     def __init__(self, x):
...         self.x = x
...     class Bar(object):
...         def __init__(self, y):
...             self.y = y
...
>>> b = Foo.Bar(20)
>>> b.y
20

Variables in Python aren’t just for storing data; they store our functions, modules, and classes, as well.  Understanding how various keywords define and update these variables can really help to understand what Python code is doing — and also provides for some cool tricks you can use in your own code.

Enjoyed this article? Join more than 11,000 other developers who receive my free, weekly “Better developers” newsletter. Every Monday, you’ll get an article like this one about software development and Python: