Understanding nested list comprehensions in Python

July 23, 2015 . By Reuven

In my last blog post, I discussed list comprehensions, and how to think about them. Several people suggested (via e-mail, and in comments on the blog) that I should write a follow-up posting about nested list comprehensions.

I must admit that nested list comprehensions are something that I’ve shied away from for years. Every time I’ve tried to understand them, let alone teach them, I’ve found myself stumbling for words, without being clear about what was happening, what the syntax is, or where I would want to use them. I managed to use them on a few occasions, but only after a great deal of trial and error, and without really understanding what I was doing.

Fortunately, the requests that I received, asking how to work with such nested list comprehensions, forced me to get over my worries. I’ve figured out what’s going on, and even think that I understand what my problem was with understanding them before.

The key thing to remember is that in a list comprehension, we’re dealing with an iterable. So when I say:

[ len(line) 
for line in open('/etc/passwd') ]

I’m saying that I want to iterate over the file object we got from opening /etc/passwd. There will be one element in the output list for each element in the input iterable — aka, every line in the file.

That’s great if I want my list comprehension to return something based on each line of /etc/passwd. But each line of /etc/passwd is a string, and thus also iterable. Maybe I want to return something not based on the lines of the file, but on the characters of each line.

Were I to use a “for” loop to process the file, I would use a nested loop — i.e., one loop inside of the other, with the outer loop iterating over lines and the inner loop iterating over consonants. It turns out that we can use a nested list comprehension, too. Here’s a simple example of a nested list comprehension:

[(x,y) for x in range(5) for y in range(5)]

If your reaction to this is, “What in the blazes does that mean?!?” then you’re not alone. Until just recently, that’s what I thought, too.

However: If we rewrite the above nested list comprehension using my preferred (i.e., multi-line) list-comprehension style, I think that things become a bit clearer:

 [(x,y)  
 for x in range(5)  
 for y in range(5)]

Let’s take this apart:

  • Our output expression is the tuple (x,y). That is, this list comprehension will produce a list of two-element tuples.
  • We first run over the source range(5), giving x the values 0 through 4.
  • For each value in x, we run through the source range(5), giving y the values 0 through 4.
  • The number of values in the output depends on the number of runs of  the final (second) “for” line.
  • The output, not surprisingly, will be all of the two-element tuples from (0,0) to (4,4).

Now, let’s mix things up by changing them a bit:

 [(x,y)  
  for x in range(5)  
  for y in range(x+1)]

Notice that now, the maximum value of y will vary according to the value of x. So we’ll get from (0,0) to (4,4), but we won’t see such things as (2,4) because y will never be larger than x.

Again, it’s important to understand several things here:

  • Our “for y” loop will execute once for each iteration over x.
  • In our “for y” loop, we have access to the variable x.
  • In our “for x” loop, we don’t have access to y (unless you consider the last value of y to be useful, but you really shouldn’t).
  • Our (x,y) tuple is output once for each iteration of the *final* loop, at the bottom.

Here’s another example: Assume that we have a few friends over, and that we have decided to play several games of Scrabble. Being Python programmers, we have stored our scores in a dictionary:

{'Reuven':[300, 250, 350, 400], 
 'Atara':[200, 300, 450, 150], 
 'Shikma':[250, 380, 420, 120], 
 'Amotz':[100, 120, 150, 180] }

I want to know each player’s average score, so I write a little function:

def average(scores):  
    return sum(scores) / len(scores)

If we want to find out each individual’s average score, we can use our function and a standard comprehension — in this case, a dict comprehension, to preserve the names:

 >>> { name : average(score)  
       for name, score in scores.items() }

{'Amotz': 137, 'Atara': 275, 'Reuven': 325, 'Shikma': 292}

But what if I want to get the average score, across all of the players? In such a case, I will need to grab each of the scores from inside of the inner lists. To do that, I can use a nested list comprehension:

>>> average([ one_score  
              for one_player_scores in scores.values()  
              for one_score in one_player_scores ])

257

What if I’m only interested (for whatever reason) in including scores that were above 200? As with all list comprehensions, I can use the “if” clause to weed out values that I don’t want. That condition can use any and all of the values that I have picked out of the various “for” lines:

>>> [ one_score      
      for one_player_scores in scores.values()     
      for one_score in one_player_scores
      if one_score > 200]

[300, 250, 350, 400, 300, 450, 250, 380, 420]

If I want to put these above-200 scores into a CSV file of some sort, I could do the following:

>>> ','.join([ str(one_score)  
               for one_player_scores in scores.values() 
               for one_score in one_player_scores  
               if one_score > 200])

'300,250,350,400,300,450,250,380,420'

Here’s one final example that I hope will drive these points home: Let’s assume that I have information about a hotel. The hotel has stored its information in a Python list. The list contains lists (representing rooms), and each sublist contains one or more dictionaries (representing people). Here’s our data structure:

rooms = [[{'age': 14, 'hobby': 'horses', 'name': 'A'},  
          {'age': 12, 'hobby': 'piano', 'name': 'B'},  
          {'age': 9, 'hobby': 'chess', 'name': 'C'}],  
         [{'age': 15, 'hobby': 'programming', 'name': 'D'}, 
          {'age': 17, 'hobby': 'driving', 'name': 'E'}],  
         [{'age': 45, 'hobby': 'writing', 'name': 'F'},  
          {'age': 43, 'hobby': 'chess', 'name': 'G'}]]

What are the names of the people staying at our hotel?

 >>> [ person['name']      
       for room in rooms
       for person in room ]

['A', 'B', 'C', 'D', 'E', 'F', 'G']

How about the names of people staying in our hotel who enjoy chess?

>>> [ person['name']  
      for room in rooms  
      for person in room  
      if person['hobby'] == 'chess' ]

['C', 'G']

Basically, every “for” line flattens the items over which you’re iterating by one more level, gives you access to that level in both the output expression (i.e., first line) and in the condition (i.e., optional final line).

I hope that this helps you to understand nested list comprehensions. If it did, please let me know! (And if it didn’t, please let me know that, as well!)

Related Posts

Prepare yourself for a better career, with my new Python learning memberships

Prepare yourself for a better career, with my new Python learning memberships

I’m banned for life from advertising on Meta. Because I teach Python.

I’m banned for life from advertising on Meta. Because I teach Python.

Sharpen your Pandas skills with “Bamboo Weekly”

Sharpen your Pandas skills with “Bamboo Weekly”
  • Thank you! This is very helpful – I’m new to Python and I’ve been struggling to understand nested comprehensions. This explanation helps a lot with that concept.

    I have one question:

    When I run the hotel example with the conditional statement, I get “TypeError: list indices must be integers or slices, not str” flagged for the line _if person[‘hobby’] == ‘chess’_.
    How would I turn person[‘hobby’] into person[int] in this kind of statement so the lookup will go through properly? This is probably something basic I have not internalized yet! It has me stumped.

    • That error message typically comes if you’ve defined a list, but then try to access it as if it’s a dict. I’m guessing that you used square brackets to define the hotel, when you should have used a dict. Double check the types of parentheses you’re using — I just checked the code on my machine, and it worked fine, so I’m guessing it’s a typo somewhere on your end.

  • Ahmer Khan says:

    This was very informative. Can I take this a bit further by asking a question that will return a trimmed dict structure? Taking your example of monopoly scores as below..
    scores = {‘Reuven’:[300, 250, 350, 400],
    ‘Atara’:[200, 300, 450, 150],
    ‘Shikma’:[250, 380, 420, 120],
    ‘Amotz’:[100, 120, 150, 180] }

    Say I wanted to return the entire structure *except* the dictionary whose list contains say 380. That is can I get a dict back containing everything excluding ‘Shikma’ since the list value for that key contains 380? I would like to get the following back.
    scores = {‘Reuven’:[300, 250, 350, 400],
    ‘Atara’:[200, 300, 450, 150],
    ‘Amotz’:[100, 120, 150, 180] }
    I know that this is possible using a nested expression but I’m not able to figure this out.

    • Ahmer Khan says:

      For the question I posted above, I know the following will work.
      {x:y for x,y in scores.iteritems() if 380 not in y}

      But the following does not work and returns even the list that has 380 in it.
      {x:y for x,y in scores.iteritems() for z in y if z != 380}

      I’m not sure why the above does not work since if I do the following, it will return the key and list that includes 380
      {x:y for x,y in scores.iteritems() for z in y if z == 380}
      Any ideas?

      • The first one is indeed the solution you want, and is identical to the solution I just posted.

        The second one will product a dict (since it’s a dict comprehension). But why is it basically giving us a copy of the original dict, without removing one of them? The reason is that a nested list comprehension will produce output for every element in the *final* “for” line. So yes, you’re ignoring the case of z being 380… but then you’re outputting a full dictionary, which means that you’re getting everything. You’re outputting the same key multiple times, but with a different value some of those times.

        To make it clearer, I suggest using a list comprehension that outputs a tuple, rather than a dict comprehension:

        [(key,value)
        for key, value in scores.items()
        for z in value
        if z != 380]

        The output, if you run this, will show you that you’re getting a *lot* of elements on that list. Only the last one is important when you’re using a dict comprehension. So in this case, you really didn’t want a nested comprehension.

    • Well, let’s think about it this way: The “scores” dict contains keys (names) and values (lists of scores). We can use a dict comprehension to create a new dict based on this one:

      { key:value
      for key, value in scores.items() }

      Not very exciting, but it does the trick. Now, let’s say that we want to create a new dict whose keys and values are identical to “scores” — but which ignores any pair in which the value contains 380. How about this:

      { key: value
      for key, value in scores.items()
      if 380 not in value }

      In other words, you don’t need a nested list comprehension to accomplish what you’re trying to do.

  • Great content! As python is not my main language of choise I found this blog post very valuable. Thank you!

  • Michael Howitz says:

    Great post. Thank you very much.
    I’ve been using python for years but nested list comprehensions always seemed strange to me. I think that I got them now. 🙂

  • For the scores I get different results; however, if I change Shikma’s 80 to 380, the values I get are the same as yours.

    • Looks like I erased a character when moving things from Emacs to WordPress; I’ll fix that. Thanks!

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
    >