Hands On: The Word Count Problem

We've integrated a python execution environment into the Playful Python site! Not only can you read code, you can now type it out and execute it, online, right here on the site ๐Ÿ˜Ž

As with the articles, we kick off the hands-on section with the word count problem.

In this exercise, you will need to implement the count_words function. It should take the text string as input and should return a dict (or dict subclass) as output.

Just type the code into the editor below, and click the Run button to run the code. The output will appear below the editor. Scroll down below the editor to get a quick refresher on the problem as well as the different approaches that we discussed in the articles. Try out all the approaches in the code editor.

Code Editor

def count_words(text):
    counts = {}
    words = text.split()
    for word in words:
        try:
            counts[word] = counts[word] + 1
        except KeyError:
            counts[word] = 1
    return counts

Inputs

Output

Initialising code editor. Please wait.
Error: {{error}}
Code editor ready. Write your code in the editor and click the Run button to view the output here
Word Count
{{ word }} {{ count }}
from js import editor, render_output, render_error def run(*args, **kwargs): text = Element("input").element.value code = editor.getValue() try: exec(code, globals()) out = count_words(text) if not isinstance(out, dict): raise Exception("output type should be a dict") for val in out.items(): if not isinstance(val[0], str): raise Exception(f"output dict key should be a string: {val[0]}") if not isinstance(val[1], int): raise Exception(f"output dict value should be an int: {val[1]}") render_output(out) except Exception as e: render_error(str(e))

Problem Statement

Given a string of words, count how many times each word appears in the string.

For example: "The quick dog and the quick fox, ran the quick race and the fox ran quick" should give the output {"the": 4, "quick": 4, "dog": 1, "and": 2, "fox": 2, "ran": 2, "race": 1}

This article contains the detailed explanation of the word count problem.

Overview of the word count algorithm

This is what the word count algorithm should do

  1. Convert the sentence into a list of words using text.split()
  2. Convert all the words to lowercase with word.lower()
  3. Remove leading and trailing commas using word.strip(',')
  4. Create an empty dictionary to store the counts
  5. Loop through the words and keep track of the count in a dictionaryโ€“if the word is already present in the dictionary then increment its count, otherwise add it to the dictionary with a count of one.
  6. Return the output

You can try implementing this in the code editor above, or scroll down all the way to the bottom of the page for an implementation of the above algorithm (without steps 2 and 3).

Look Before You Leap

In this coding style, you would write step 5 of the above algorithm like this.

for word in words:
    if word in counts:
        counts[word] = counts[word] + 1
    else:
        counts[word] = 1

View the full article on this coding style.

Easier to Ask for Forgiveness then Permission

In this coding style, you would write step 5 of the above algorithm like this.

for word in words:
    try:
        counts[word] = counts[word] + 1
    except KeyError:
        counts[word] = 1

View the full article on this coding style.

Using the get method of dictionaries

Using the get method of dict for step 5 would get us this. No need for if or try

for word in words:
    counts[word] = counts.get(word, 0) + 1

View the full article on this technique.

Using defaultdict

Here is how we can do it using defaultdict.

First import defaultdict at the top

from collections import defaultdict

In step 4, create defaultdict(int) instead of a regular dict. Then step 5 would become

for word in words:
    counts[word] = counts[word] + 1

View the full article on using defaultdict.

Using Counter

And here is how to make it use Counter.

First, import

from collections import Counter

and then we don't even need a loop. We can replace step 5 with

counts = Counter(words)

View the full article on using Counter.

Implementing word transformations

If we want to count the same word with different cases (eg: The / the) together then we need to first transform all the words to the same case before we start counting

words = [word.lower() for word in words]

If we want to remove punctuation at the end of the word (eg: for / for, ) and count those words as the same, then we need to add this transformation before the counting

words = [word.strip(",") for word in words]

View the full article on implementing words transformations.

Implementation for the original problem

Here is the code for the base algorithm. You can type this out (or copy-paste if you want, but I would recommend typing it out) on the editor on top and try it out.

def count_words(text):
    counts = {}
    words = text.split()
    for word in words:
        try:
            counts[word] = counts[word] + 1
        except KeyError:
            counts[word] = 1
    return counts