Let’s say that we create a new, empty Python dictionary:
>>> d = {}
How much memory does this new, empty dict consume? We can find out with “sys.getsizeof“:
>>> import sys
>>> sys.getsizeof(d)
240
In other words, our dictionary, with nothing in it at all, consumes 240 bytes. Not bad; given how often dictionaries are used in Python, it’s good to know that they don’t normally consume that much memory.
What if I add something to the dict? What will happen to the memory usage?
>>> d['a'] = 1
>>> sys.getsizeof(d)
240
Something seems a bit fishy here, right? How can it be that our newly created dictionary, with zero key-value pairs, takes up the same space in memory as our dictionary with one key-value pair?
The answer is that “sys.getsizeof” is returning the size of the dictionary as a data structure, not the data inside of it. In other words: When we first create a dictionary, it contains eight slots that can be filled with key-value pairs. Only when the dictionary needs to grow, because it has too many key-value pairs for its current size, does it allocate more memory.
Moreover, the key-value pairs themselves aren’t stored in the dict itself. Rather, just a reference to the place in memory that holds the keys and values is stored there. So neither the type nor the size of the data is kept in the dictionary, and it certainly doesn’t affect the result of “sys.getsizeof” for the dictionary. Indeed, watch this:
>>> d['a'] = 'a' * 100000
>>> sys.getsizeof(d)
240
Even when the value is 100,000 characters long, our dictionary only needs 240 bytes.
What happens as we expand our dictionary? When does it request more memory? Let’s take a look:
>>> d = {}
>>> for one_letter in 'abcdefghijklmnopqrstuvwxyz':
d[one_letter] = one_letter
print(f'{len(d)}, sys.getsizeof(d) = {sys.getsizeof(d)}')
1, sys.getsizeof(d) = 240
2, sys.getsizeof(d) = 240
3, sys.getsizeof(d) = 240
4, sys.getsizeof(d) = 240
5, sys.getsizeof(d) = 240
6, sys.getsizeof(d) = 368
7, sys.getsizeof(d) = 368
8, sys.getsizeof(d) = 368
9, sys.getsizeof(d) = 368
10, sys.getsizeof(d) = 368
11, sys.getsizeof(d) = 648
12, sys.getsizeof(d) = 648
13, sys.getsizeof(d) = 648
14, sys.getsizeof(d) = 648
15, sys.getsizeof(d) = 648
16, sys.getsizeof(d) = 648
17, sys.getsizeof(d) = 648
18, sys.getsizeof(d) = 648
19, sys.getsizeof(d) = 648
20, sys.getsizeof(d) = 648
21, sys.getsizeof(d) = 648
22, sys.getsizeof(d) = 1184
23, sys.getsizeof(d) = 1184
24, sys.getsizeof(d) = 1184
25, sys.getsizeof(d) = 1184
26, sys.getsizeof(d) = 1184
As you can see, the dictionary adds more key-value pairs, it needs more memory. But it doesn’t grow with each addition; each time it needs more space, it allocates more than it needs, so that the allocations can be relative rare.
What happens if we remove items from our dictionary? Will it return memory to the system? Let’s find out:
>>> for key in list(d.keys()):
d.pop(key)
>>> len(d)
0
Notice that in the above code, I didn’t iterate over “d” or “d.keys”. Doing so would have led to an error, because changing a dictionary while iterating over it is a problem. I thus created a list based on the keys, and iterated over that.
You can also see that after removing these name-value pairs from my dict, it is indeed empty. And its memory usage?
>>> sys.getsizeof(d)
1184
In other words: Even though we’ve removed items from our dict, it hasn’t released the memory that it previously allocated. Of course, given how rarely I find myself removing items from dicts in actual Python code, I’m not hugely surprised that this happens. After all, why return memory to the system if you’re unlikely to need to do that? But it means that if you do allocate tons of memory to a dict, then you’re unlikely to get it back until the program ends, even if you remove items.
But wait: What if I remove everything from the dict? There’s a method, “dict.clear“, that does this. I don’t use it very often, but it might at least provide us with some useful data:
>>> d.clear()
>>> len(d)
0
>>> sys.getsizeof(d)
72
Wait a second here: After running “dict.clear”, our dict size is indeed 0. Which is what it was before. But we’re somehow using less memory than we even did at the start, when we created an empty dict! How can that be?
It would seem that when you run “dict.clear”, it removes not only all of the key-value pairs, but also that initial allocation of memory that is done for new, empty dictionaries. Meaning that we now have an “emptier than new” dictionary, taking up a paltry 72 bytes in our system.
If we add a new key-value pair to our dict, then if my theory is right, we should get back to the original size of 240 bytes:
>>> d['a'] = 1
>>> len(d)
0
>>> sys.getsizeof(d)
240
Sure enough, adding that one key-value pair to “d” forced the dictionary to allocate the same amount of memory it had before, back when we first created it.
nice post! i didn’t know dict.clear() is really useful
Really good post! I haven’t seen anyone use dict.clear()..good !
I love this post, detailed and well written post for the dict size underhood!!!kudos
Thank you very much for the explanation. I just started noting the size of a dict object that I was using and was puzzled by what sys.getsizeof was returning. You post clearly explains what was going on!
There are alternatives to dictionaries. Tuples and objects can save a lot of memory, see https://strangemachines.io/articles/performant-python
Oh, there are definitely alternatives to dictionaries! And I know that many people use (and love) named tuples precisely because they use so much less memory. That said, I was curious to know just how the memory works for dicts. But your point is a good one.
Although I agree that tuples and objects can save memory, there are things I would expect dictionaries to do faster (in particular, inserts and lookups), and thus dictionaries cannot be replaced altogether.
Sometimes we have to trade memory use for execution speed! (And vice versa).