The built-in Python collections library is a treasure trove of useful tools. I will focus on the two structures that I find the most useful: Counter and defaultdict. Understanding these data structures will help you make your code more concise, readable, and easy to debug.

Counter

The Counter object takes an iterable and aggregates the items into counts of unique values in the iterable. The results are stored in a dictionary-like structure where the unique items are keys and the counts are values. For example, the following code takes a list of words and returns the counts of each word:

from collections import Countertext = "apple banana orange apple apple orange"
counts = Counter(text.split())print(counts.most_common())
# [('apple', 3), ('orange', 2), ('banana', 1)]print(counts['apple'])
# 3print(counts['pear'])
# 0

You can retrieve values from a Counter in the same way as you would with a normal Python dictionary. Note that Counter has the very nice property that if you query a Counter for a key that doesn’t exist (like ‘pear’ above), it returns 0 rather than giving you a KeyError.

Another very useful feature of Counter objects is that they can be merged with a simple + operator. This makes combining counts of items from different locations/files a breeze:

https://towardsdatascience.com/media/5b2ddedc9c6e68713e157744ba2cc093

This saves a lot of time and lines of code. I use Counter a lot in text-processing/NLP tasks, and it definitely makes my life easier. Here are a few final tips and tricks for working with Counter:

  • use dict() to convert a counter into a plain Python dictionary.
  • use the most_common() function with no arguments to return a list of (item, count) tuples, sorted in descending order by count.
  • count characters in a string using Counter— this works because a string is an iterable in Python.

defaultdict

This is a great alternative to the basic dictionary data structure when you don’t want to worry about KeyErrors and special cases. You simply create a defaultdict with a default of your choice, and the data structure will automatically assign the default value to any previously unseen keys. The important thing to understand is that the argument to a defaultdict constructor should be a callable. This includes the following:

  • list : default is an empty list
  • int : default is 0
  • lambda expressions: very flexible, can make anything callable
  • set : default is empty set

This is a very useful data structure, because it removes the need to check if an item exists in a dictionary before incrementing/modifying its value.https://towardsdatascience.com/media/452e130d00d297e3e7580ff38a128cbd

Let’s see a practical example of when we could use a defaultdict to write really elegant, concise code. The below code uses defaultdict to implement the training loop of a bigram language model from scratch in 5 lines of code! For more context on n-gram language models, you can check out a previous post I wrote here, in which I did NOT use defaultdict. Notice the amount of code we save by using defaultdict!

https://towardsdatascience.com/media/cd50451ce655bd84cfe424f2c118f4c0

The trick is the nested use of defaultdict on Line 6 above. A language model is trained to learn the probabilities of words in context. We want to have a nested data structure where the outer-layer key specifies the context (i.e. previous word in the case of a bigram model) and the inner-layer key specifies the current word. We want to be able to ask questions like: “In the training data, how many times was the word the followed by the word cat”?

Note that the inner defaultdict is actually just doing the exact same thing as a Counter, so we can replace the above line of code with the following line and have the same result:

self.d = defaultdict(Counter)

Conclusion

Thank you for reading this far! I hope you try using the Counter and defaultdict structures in your next Python project. Let me know in the comments if you have any other good use cases for them! If you found this interesting, check out my other Python-related articles:

Original Source

Author

Check out my podcast Modus Mirandi https://podcasts.apple.com/us/podcast/modus-mirandi-podcast-with-thomas-hikaru-clark/id1551675175?uo=4.