Home
Blog
Python
Playing with Python strings, lists, and variable names — or, a complex answer to a simple question

Playing with Python strings, lists, and variable names — or, a complex answer to a simple question

June 13, 2019 . By Reuven

I recently received a question from a reader of my “Better developers” list. He asks:

Is there any way to turn a str type into a list type? For example, I have a list of elements, and want to turn that element into a separate list. For example, if I have

test = ['a', 'b', 'c']

I want the output to be

a=[], b=[], c=[]

One of the mantras of Python is that there should be one, and only one, way to do something. Reality has a way of being more complex than that, though, and in this particular case, the problem that my reader described in words and what he put in code weren’t exactly the same thing. (Which is a common problem in the professional software world — the specifications say one thing, but the client’s intentions say another.)

Let’s start with what my reader says he wants to do, and then get to what he actually seems to want:

He says that he wants to turn a string into a list. Well, there are a few ways to do that. The easiest is to use the “list” class, and apply it to a function:

>>> s = 'abc'
>>> mylist = list(s)
>>> mylist
['a', 'b', 'c']

In such a case, the “list” class (which can be called, like a function, and is thus known as a “callable” in the Python world) iterates over the elements of our string. Each element is turned into a separate element in a new list that it returns.

This is fine if you want to create a new list with the same number of elements as there are characters in the string. After all, both strings and lists are Python sequences; when you create a list in this way, based on a string, you’ll find that the new list’s length and elements are identical. So s[0] and mylist[0] will return the same result, as will “len(s)” and “len(mylist)” even though “s” and “mylist” are different types.

Another way to create a list from a string is via the “str.split” method. I use this method all the time, especially when taking input from a user and iterating over the words, or fields, that the user provides. For example:

>>> words = 'here are some words'
>>> words.split(' ')
['here', 'are', 'some', 'words']

The result of “str.split” is always a list of strings. And as you can see in the above example, we can tell “str.split” what string should be used as a field delimiter; “str.split” removes all occurrences of that string, returning a list of strings.

What happens if our string is a bit weird, though, such as:

>>> words = 'here    are some     words'

Now we’re going to get an equally weird result:

>>> words.split(' ')
['here', '', '', '', 'are', 'some', '', '', '', '', 'words']

This happens because “str.split” has taken our instructions very literally, as computers do: Whenever you encounter a space character, create a new element in the output list. However, this is rarely the solution that you want, and thus “str.split” has a great default: If you don’t pass anything (or pass “None” explicitly), then any length of whitespace characters will be treated as a single delimiter. Which means that we can say:

>>> words = 'here    are some     words'
>>> words.split()
['here', 'are', 'some', 'words']

This is quite useful… and yet, while this is how I interpreted the question I got, it’s not what the user wants.

Rather, what he seems to want is to create new variables based on the elements of the string. So if the string is “abc”, then we want to create new variables “a”, “b”, and “c”, each of which references an empty list.

This is certainly possible, but I’ll admit it’s a bit odd. However, it gives us a chance to delve into some of Python’s more rarely used capabilities. (At least, I almost never use them — maybe other people are different!)

My first reaction to creating variables dynamically is to say, “No, you don’t really want to do that,” and to suggest that we create a dictionary, instead. You can think of a dict as your own private namespace, one which can’t and won’t interfere with the variables created elsewhere.

We could create an empty dictionary, and then iterate over the string, adding new key-value pairs to it, with each value being an empty list:

>>> for one_letter in 'abc':
        d[one_letter] = []

>>> d
{'a': [], 'b': [], 'c': []}

There is, however, a better way to do what we did here, and that is by using the “dict.fromkeys” class method. This is a great shortcut to creating a dictionary whose keys are known but whose values aren’t, at least not at the start. So we can say:

>>> dict.fromkeys('abc')
{'a': None, 'b': None, 'c': None}

As you can see, the value associated with each key here is “None”. We don’t want that; instead, we want to have an empty list. So we can pass an empty list as a second, optional argument to “dict.fromkeys”:

>>> dict.fromkeys('abc', [])
{'a': [], 'b': [], 'c': []}

However, you should be a bit nervous before working with the dictionary I’ve created here, because every single one of the values now refers to the same list! For example:

>>> d = dict.fromkeys('abc', [])
>>> d
{'a': [], 'b': [], 'c': []}
>>> d['a'].append(1)
>>> d['b'].append(2)
>>> d['c'].append(3)
>>> d
{'a': [1, 2, 3], 'b': [1, 2, 3], 'c': [1, 2, 3]}

In many ways, this is similar to the problem of mutable defaults, in that we have a single value referenced in multiple places. It’s pretty obvious to experienced Python developers that this will happen, but it’s far from obvious to newcomers.

Another way to do this would be to use a dict comprehension:

>>> {one_letter : []
     for one_letter in 'abc'}
{'a': [], 'b': [], 'c': []}

“Wait,” you might be saying, “Maybe we have to worry about these lists also all referring to the same thing?”

Nope:

>>> d = {one_letter : []
         for one_letter in 'abc'}
>>> d['a'].append(1)
>>> d['b'].append(2)
>>> d['c'].append(3)
>>> d
{'a': [1], 'b': [2], 'c': [3]}

What’s the difference between this, and our previous use of “dict.fromkeys”? The difference is that here, the “[]” empty list is evaluated anew with each iteration over the string. Thus, we get a new empty list each time. By contrast, passing the same empty list as a second argument to “dict.fromkeys” gave us the same list each time.

So if you want to use a dict — and that’s my recommendation — then you are good to go! But if you really and truly want to create variables based on the values in the string, then we’ll have to use a few more tricks.

One is to take advantage of the fact that global variables are actually stored in a dictionary. Yes, that’s right — you might think that when you write “x=100” that you’re storing things in some magical location. But actually, Python turns your variable name into a string, and uses that string as a key into a dictionary.

We don’t have direct access to this dictionary, but we can retrieve it using the “globals” builtin function. Here’s what happens when I invoke “globals” in a brand-new Python 3 interactive shell:

>>> globals()
{'__name__': 'main', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>}

See what happens now, after I assign some variables:

>>> x = 100
>>> y = [10, 20, 30]
>>> z = {'a':1, 'b':2}
>>> globals()         
{'__name__': 'main', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'x': 100, 'y': [10, 20, 30], 'z': {'a': 1, 'b': 2}}

Take a look at the end, and you’ll see our three newly assigned variables.

It turns out that we can also define (or update the values of) global variables in this way, too:

>>> globals()['x'] = 234
>>> globals()['y'] = [9,8,7,6]
>>> globals()['z'] = 'hello out there'         
>>> globals()
{'__name__': 'main', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'x': 234, 'y': [9, 8, 7, 6], 'z': 'hello out there'}

I don’t really recommend this in actual code, but if you’re absolutely, positively sure that you want to do this, then you can accomplish this task in the following way:

>>> for one_letter in 'abc':
    globals()[one_letter] = []

Sure enough:

>>> x
[]
>>> y
[]
>>> z
[]

Again, you almost certainly don’t want to have this sort of code in production. But it does work, as we see here.

Something else we could do is use the “exec” function, which lets us run any string as a tiny Python program. We could thus say:

>> for one_letter in 'abc':
        exec(f'{one_letter} = []')

>>> a
[]
>>> b
[]
>>> c
[]

As you can see, it worked: We used an f-string to create a tiny (one-statement) Python program, and then used “exec” to run it. Note that we wouldn’t be able to use the related “eval” function here, because “eval” expects to have an expression, and assignment in Python isn’t an expression.

Finally, I’d generally argue that it’s a good idea not to create or manipulate global variables whose names are created dynamically from the user’s input. It’s probably best (as I wrote above) to use a dictionary. However, if you really insist on doing this, then you should probably do it in a module.

But wait — aren’t modules normally defined in files? Yes, but you can create a module on the fly by running the “module” class, just as we did above with the “list” class. There’s just one hitch, namely that the “module” class isn’t available to us in any of the Python namespaces.

That’s OK: We can grab the class via another module (e.g., __builtins__), and then invoke it, passing it the name of the module we want to create. Then we can use the builtin “setattr” function to assign a new attribute to the module. Here’s how that would look:

>>> mymod = type(__builtins__)('mymod')
>>> for one_letter in 'abc':
setattr(mymod, one_letter, [])
>>> vars(mymod)
{'__name__': 'mymod', '__doc__': None, '__package__': None, '__loader__': None, '__spec__': None, 'a': [], 'b': [], 'c': []}

Sure enough, we’ve managed to do it!

By the way, remember how I mentioned, all the way back, that it would probably be best to use a dictionary, rather than create actual variables? Well, as you can see here, a module is actually just a fancy wrapper around… a dictionary.

This seemingly simple question raised all sorts of interesting Python functionality, none of which (I’m guessing) was ever intended by the person who asked the question. But I hope that this has given you a glimpse into the ways in which Python has implemented, and how a dynamic language allows us to play with our environment in ways that not only stretch our minds, but sometimes even the boundaries of good taste.

Related Posts

Prepare yourself for a better career, with my new Python learning memberships

I’m banned for life from advertising on Meta. Because I teach Python.

Sharpen your Pandas skills with “Bamboo Weekly”

Splendid!
Thanks.

hey! awesome post. Just one question: why does the code:

d = {one_letter : []
for one_letter in ‘abc’}

automatically alphabetize the list?

reuven says:

at

The dict’s keys are alphabetized because as of Python 3.6 (and officially, 3.7), dict keys are kept in the order that they’re defined. So because the string was ‘abc’, the dict’s keys are kept in that order.

Reply

[…] post Playing with Python strings, lists, and variable names — or, a complex answer to a simple question appeared first on Reuven […]