Big O Notation: Do you need to know this?

I came across this tweet recently and thought this topic would make for an interesting article.

Hey programmers, has O notation ever mattered to you in a professional context?
— studio giblets (@moonpolysoft) November 16, 2022

Big-O notation is commonly seen as one of those things that is taught in college computer science, and is a favourite interview question, that has little practical application in day to day work.

I would like to argue that this is not true and you probably use it instinctively everyday (and if not, then you should).

Big O Notation

If you have never come across this term, then here is a small explanation. If you are already familiar with it then feel free to skip over to the next section.

Big O represents the time complexity (or space complexity) of an algorithm. That's quite abstract, so lets take an example.

Say we have an unsorted list of numbers, each number appearing once, and we want to find the position of a number. How would we write a function to do this?

Well, we will have to go through each item one by one, and when we find the item, we return the position. Here is the code to do that:

def search(lst, target):
    for index, item in enumerate(lst):
        if item == target:
            return index
    return None

You will realise that in the worst case (assume the item we want is the last item in the list) we will have to go through the entire list once. So, if there are n items in the list, then the search time will be proportional to n. It the list is double the size, then the search time is also double. We can say this algorithm is order n represented as O(n)

Now, let us assume that the list of numbers is always sorted. Then how do we find the number?

This time we don't need to search every item one by one. We can check the number in the midpoint of the list. If it is target number is more that the midpoint number, then we know that the target is somewhere after the midpoint. If it is lesser, then the target is before the midpoint. Straight away we narrow the search by half and don't need to search the other half. We repeat the process of halving and searching in the part of the list where we narrow down the item. Here is the code:

def search_sublist(lst, target, start, end):
    if start > end:
        return None
    mid = (start + end) // 2
    value = lst[mid]
    if target == value:
        return mid
    elif target > value:
        return search_sublist(lst, target, mid + 1, end)
    else:
        return search_sublist(lst, target, start, mid - 1)


def search(lst, target):
    return search_sublist(lst, target, 0, len(lst))

This way of searching is called binary search and it has a time complexity of O(log n) which is faster than the earlier O(n) algorithm. If the list is doubled in size, it just takes one extra comparison to find the item. Of course, this one has the constraint that the list has to be sorted!

✏️

Pro Tip: The python standard library has a module bisect that has functions to easily maintain a sorted list and search items in it. So you don't have to code it yourself!

The case against Big O

Now, most of the time programmers don't write data structure code by hand. We use libraries, and we don't need to know or understand how the libraries are implemented. It's a black box and we just use it.

99% of real life professional programming is working with libraries and frameworks like this (unless you are the one developing the library), so the argument goes that a modern, professional programmer doesn't need to know or understand Big O since all the data structure decisions are handled by the library author.

The case for Big O

The argument that I want to make is that every programmer is subconsciously making Big O decisions all the time, whether they know it or not.

Let us take an example of storing some numbers with the view to find whether some number is present or not. You might do it like this

nums = [10, -2, 24, 7, 55]

I would say that you have made a Big O decision right here. Why? Because Python also gives you the option to store those numbers in a set, like this

nums = {10, -2, 24, 7, 55}

So which should you choose? The list or the set?

Well, if you want to search for the presence of items, then the set is by far the better choice. It is O(1) to find items in the set, which means to say it takes roughly the same amount of time, no matter how many numbers are in there. Even if you double the size, it takes the same amount of time to do the search. Searching items in an unsorted list, by contrast, is O(n) as we saw previously. And even a sorted list is O(log n) which is worse performance than the set.

So, assuming you are familiar with the set data type, every time you choose to use a list over a set or vice versa, you have implicitly thought about Big O.

You may not have formally taken a piece of paper and derived the formula. And if you ask me, that's one part of Big O that we can leave to the algorithm designers. But understanding that a set can search items much faster is a implicit knowledge of Big O. And making small decisions like that is something that every professional programmer is making day in and day out.

Even when you use libraries, you sometimes need to understand what is going on in order to use the library. For example, python helpfully gives us a bisect module which implements the binary search algorithm for us. So, we don't need to implement the algorithm, we can just use the library. But unless you know that binary search is more efficient you probably wont use that module at all, and will stick to searching all items one-by-one even if the data is already sorted.

Summary

When programmers don't understand Big O, they are highly likely to just use a list for anything and everything. This is fine when you are learning python, or if you are writing small scripts. But to improve your code to the next level, it is important to understand what are all the data structures provided by python, and when to use each one. You don't need to be able to derive Big O values, but at least understand options relative to each other.

This is the first article in a series of articles that we will be publishing about data structures in python. In the weeks to come, we will take a look at this topic in more detail.

If you are interested in following this topic, be sure to subscribe to the site below (it's free) so you get notified as new articles are posted 👇🏾

Did you like this article?

If you liked this article, consider subscribing to this site. Subscribing is free.

Why subscribe? Here are three reasons:

You will get every new article as an email in your inbox, so you never miss an article
You will be able to comment on all the posts, ask questions, etc
Once in a while, I will be posting conference talk slides, longer form articles (such as this one), and other content as subscriber-only

Tagged in:

series-data-structures

Big O Notation: Do you need to know this?

Big O Notation

The case against Big O

The case for Big O

Summary

Siddharta Govindaraj

Other Stories

Python 3.11: The tomllib library

Book Review: The Big Book of Small Python Projects

AI: What does a machine learning algorithm do anyway?

Building AI Applications from first principles

My experience at PyCon Italia

Press ESC to close

Or check our Popular Categories...

Big O Notation

The case against Big O

The case for Big O

Summary

Share Article:

Related Articles

Other Stories

Python 3.11: The tomllib library

Book Review: The Big Book of Small Python Projects