As with the articles, we kick off the hands-on section with the word count problem.
In this exercise, you will need to implement the count_words
function. It should take the text
string as input and should return a dict
(or dict subclass) as output.
Just type the code into the editor below, and click the Run button to run the code. The output will appear below the editor. Scroll down below the editor to get a quick refresher on the problem as well as the different approaches that we discussed in the articles. Try out all the approaches in the code editor.
Code Editor
def count_words(text): counts = {} words = text.split() for word in words: try: counts[word] = counts[word] + 1 except KeyError: counts[word] = 1 return counts
Inputs
Output
Word | Count |
---|---|
{{ word }} | {{ count }} |
Problem Statement
Given a string of words, count how many times each word appears in the string.
For example: "The quick dog and the quick fox, ran the quick race and the fox ran quick"
should give the output {"the": 4, "quick": 4, "dog": 1, "and": 2, "fox": 2, "ran": 2, "race": 1}
This article contains the detailed explanation of the word count problem.
Overview of the word count algorithm
This is what the word count algorithm should do
- Convert the sentence into a list of words using
text.split()
- Convert all the words to lowercase with
word.lower()
- Remove leading and trailing commas using
word.strip(',')
- Create an empty dictionary to store the counts
- Loop through the words and keep track of the count in a dictionary–if the word is already present in the dictionary then increment its count, otherwise add it to the dictionary with a count of one.
- Return the output
You can try implementing this in the code editor above, or scroll down all the way to the bottom of the page for an implementation of the above algorithm (without steps 2 and 3).
Look Before You Leap
In this coding style, you would write step 5 of the above algorithm like this.
for word in words:
if word in counts:
counts[word] = counts[word] + 1
else:
counts[word] = 1
View the full article on this coding style.
Easier to Ask for Forgiveness then Permission
In this coding style, you would write step 5 of the above algorithm like this.
for word in words:
try:
counts[word] = counts[word] + 1
except KeyError:
counts[word] = 1
View the full article on this coding style.
Using the get method of dictionaries
Using the get
method of dict
for step 5 would get us this. No need for if
or try
for word in words:
counts[word] = counts.get(word, 0) + 1
View the full article on this technique.
Using defaultdict
Here is how we can do it using defaultdict
.
First import defaultdict
at the top
from collections import defaultdict
In step 4, create defaultdict(int)
instead of a regular dict. Then step 5 would become
for word in words:
counts[word] = counts[word] + 1
View the full article on using defaultdict.
Using Counter
And here is how to make it use Counter
.
First, import
from collections import Counter
and then we don't even need a loop. We can replace step 5 with
counts = Counter(words)
View the full article on using Counter.
Implementing word transformations
If we want to count the same word with different cases (eg: The / the) together then we need to first transform all the words to the same case before we start counting
words = [word.lower() for word in words]
If we want to remove punctuation at the end of the word (eg: for / for, ) and count those words as the same, then we need to add this transformation before the counting
words = [word.strip(",") for word in words]
View the full article on implementing words transformations.
Implementation for the original problem
Here is the code for the base algorithm. You can type this out (or copy-paste if you want, but I would recommend typing it out) on the editor on top and try it out.
def count_words(text):
counts = {}
words = text.split()
for word in words:
try:
counts[word] = counts[word] + 1
except KeyError:
counts[word] = 1
return counts
Did you like this article?
If you liked this article, consider subscribing to this site. Subscribing is free.
Why subscribe? Here are three reasons:
- You will get every new article as an email in your inbox, so you never miss an article
- You will be able to comment on all the posts, ask questions, etc
- Once in a while, I will be posting conference talk slides, longer form articles (such as this one), and other content as subscriber-only