Pure Functions

31 Aug 2024

The hardest problem in software engineering (aside from choosing which program to write) is keeping your program simple enough for maintainers to confidently read, understand, and make changes to. This problem is called “managing complexity,” and there are lots of famous quotes about it¹. Managing complexity is easy when a program is small, but it gets exponentially harder as years pass, the program gets bigger, the engineering organization gets bigger too, and the program’s original authors leave.

Pure functions are my favorite tool for managing complexity. Let’s talk about what they are and why they’re so effective. Note: I’ll be showing examples in Python, but you can write pure functions in any programming language.

What is a pure function?

We call a function “pure” if it follows these two rules:

It always returns the same outputs when given the same inputs.
It performs no side effects.

Here are some examples of what I mean when I say “side effects”:

Mutating one of the function’s arguments
Mutating a global variable
Reading/writing to a database
Making an HTTP request
Sending an email
Sending a push notification
Firing a missile

Examples

This is a pure function:

This function is impure:

This one’s super impure:

Primary Benefit

We have to put away some our tools when we write pure functions: we can’t read from the database, we can’t check what time it is, we can’t pull an API key from a global config object. What do we get in return?

The primary benefit of pure functions is that they are simple enough to fit into your head. In order to understand what a pure function does, you just need to look at these things:

What are the function’s inputs?
What are the function’s outputs?

By contrast, here are some of the things that you need to think about when you’re reading an impure function:

What are the function’s inputs?
- When the function has finished running, what state will the inputs be in?
- Will some of the inputs have been mutated? Which ones?
- Will the inputs be mutated every time the function runs, or only sometimes?
What are the function’s outputs? Does it have any?
What global variables does the function read from?
- Have those global variables been initialized the way that we expect by the time that this function is called?
- What happens if they haven’t?
What global variables does the function write to?
- How does that affect the other parts of the program that read those variables?
What if the database is unreachable?
What if the API we’re calling is down?
What day of the week is it?
What is the phase of the moon as seen from Mars?
- Which moon?

When working with pure functions, you can think about the function in isolation and don’t have to worry about fitting the rest of the program into your head. I’ve heard this described as “local reasoning” (which pure functions enable you to do), as opposed to “global reasoning” (which impure code forces you to do).

Secondary Benefits

As if that weren’t enough, you also get these things for free:

Pure functions are safe to cache, since the same inputs always give the same outputs.
Pure functions are safe to parallelize, since they don’t mutate anything.
Pure functions are trivial to test, since you don’t need to mock anything.

Let’s zoom in on that last bullet point, because the difference is really unbelievable. Here’s a test for the pure function I showed you earlier:

Here’s a test for the one of the impure functions:

Which of these two worlds would you rather live in?

General Advice

It’s easiest to write pure functions when you’re working with “plain data,” i.e. stuff that doesn’t have an active connection to the database. You can still make a function pure even if it’s operating on database models, though: as long as your function follows the two rules we talked about earlier, it’s pure!

I’ll bet that a lot of functions in your codebase are just one or two tweaks away from purity. As you get in the habit of looking for side effects, you’ll get better at identifying them and eliminating them. For example: does your function really need to modify its inputs and return None? What if it left its inputs unmodified and returned a value instead?

Of course, not every side effect can be removed. We write and run programs because they do stuff! For now, just focus on finding and eliminating the side effects that don’t need to happen (and the ones that don’t need to happen in this specific function). Next time we’ll talk about how to handle the side effects that remain!

Appendix: Smells To Watch Out For

If a function takes no inputs, it’s probably impure.
If a function has no output, it’s probably impure.
If a function is async, it’s probably impure.
If you need to use mocks when testing a function, it’s probably impure.

References

Hoist Your I/O
Functional Core, Imperative Shell (Scott Wlaschin version)
This refactoring exercise from “Solving Problems the Clojure Way” - I love the visualization technique the presenter uses, it makes it really easy to follow the side effects as they get moved or eliminated.
Functional Programming in C++ (by John Carmack!!)

Edsger Dijkstra: “The computing scientist’s main challenge is not to get confused by the complexities of his own making.” Steve McConnell: “Managing complexity is the most important technical topic in software development.” Ben Moseley and Peter Marks: “Complexity is the single major difficulty in the successful development of large-scale software systems.” Dr. Pamela Zave: “The purpose of software engineering is to control complexity, not to create it.” Bruce Eckel: “Programming is about managing complexity: the complexity of the problem, laid upon the complexity of the machine. Because of this complexity, most of our programming projects fail.” ↩

jrheard's blog