I have played with pure JavaScript functions in the previous blog post Test if a function is pure. I know that pure functions are simpler to test and use in my code, but telling if a particular given function is pure is NOT easy. Let us look at a few examples where it is hard to tell if a function is pure or not, and then I will show how to run a given function in a very controlled fashion, completely isolation from its lexical (written) environment to test if it is practically pure.
Definition of a pure function
Eric Elliott in a blog post gives the following definition of pure functions
- Given the same input, will always return the same output.
- Produces no side effects.
- Relies on no external state.
I can explain each property by giving a counter example
1: Pure function should return the same output given same inputs, unlike this function
1 | function random() { return Math.random() } |
2: Pure function produces no side effects (leaves the environment unaffected), unlike this function
1 | function add(a, b) { |
Common side effects are changing properties on the global
object, writing to the console,
changing the HTML of the page the script is executed on.
3: Pure function should not rely on the external state, unlike this function
1 | const K = 10 |
update
After this blog post went live, a lot of people objected to the example above. Most people agree
that var K = 10
makes add10
impure, but the const K
can be safely substituted into its use
inside add10
and thus keeps the function pure. I agree with allowing using external constants
and pure functions.
We have these simple three properties, but applying them in practice is anything but simple.
Pure or not?
I have been adding more examples to the list asking if each example is pure or not. Seems even humans sometimes have trouble drawing a line. For example, everyone agrees the following example is pure
1 | function sum(a, b) { return a + b } |
and this example is NOT
1 | const k = 2 |
Yet, practically speaking, programmers consider a function pure if it only uses other pure
functions without passing them as arguments. In the code below, both sum
and sub
are pure
by the human consensus.
1 | function sum(a, b) { return a + b } |
If one considers sub
NOT pure, because it is breaking rule #3, then a pure function is only
allowed to use functions passed inside via arguments.
1 | function sum(a, b) { return a + b } |
The above example then shows several things
- Programming with pure functions is a pain because we loose lexical scope (all the source around the function), and have to drag huge number of functions around.
- A function can remain pure but use impure functions as arguments.
More on the later point. We can mark each line to better see the "purity"
1 | function sum(a, b) { return a + b } // definitely pure |
The sub(10, 2, sum)
source line is not pure because the result has to go somewhere,
and in this case,
if we just execute it in the Node environment, it is printed in the console (which we cannot do,
breaks rule #2).
The above example shows that the same program can have parts that are pure and parts that have side effects. One of our goals when refactoring is to increase the first part, shrinking the second one. Some techniques help with this, for example using immutable data library or isolating the environment (Hi Cycle.js!)
If a pure function executes a function passed as the input argument, does the input function have to be pure? Seems so, otherwise the "pure" function can break rule #1. Imagine
1 | sub(10, 2, Math.random) // 8.59 |
It is not enough to follow the above 3 rules then. A pure function better be passed pure functions as arguments; the rule #1 (consistency) seems to be stronger requirement than rule #3 (only using the input arguments)
Breaking the rules?
We can make the above example a little more contrived again. What about a pure function returning non-pure function?
1 | function sumK(a, b) { |
The function inner
is not pure - it is breaking rule #3. Function sumK
seems pure to me -
it is always producing the same effect, is not changing the environment and does not rely on
the external state.
1 | sumK(2, 3)() // 15 |
Let us compare the two examples side by side. First, we have a function that only uses its environment, but does NOT know if the function passed inside is pure or not.
1 | function sub(a, b, summer) { return summer(a, -b) } // pure? |
Second, we have a function that uses outside state, yet ALWAYS produces the same output
1 | const K = 10 |
This is a dilemma - do I rather code passing functions around to avoid any function relying on the external state, making the decision about which particular function is pure, or do I use a simple constant external state that always produces the same result?
In fact, I would argue that using const
keyword is preferred, because we can replace each
use of a constant (unless it is an expression involving other variables) inside a function with
its value to get obviously pure function
1 | const K = 10 |
The simple replacement goes back to functions using external functions (not functional expressions). Most programmers consider both functions pure when declared like this
1 | function sum(a, b) { return a + b } |
but not when declared as functional expressions
1 | var sum = function sum(a, b) { return a + b } |
The main problem with using functional expressions above is that we are NOT using a function,
instead we are using variable sum
to call function sum
. Thus we are breaking rule #3 - using
outside variable, and that variable is not constant.
1 | var sum = function sum(a, b) { return a + b } |
The distinction between functions and the variables pointing at them is important. In JavaScript
functions can be passed around, which means unless two functions are declared in the same file,
they will be passed around, even via require('./sum')
in which case all bets
on consistency are off! It is really hard to lock down loaded modules in Node to avoid someone
changing the code under your feet.
1 | const calc = require('./common-calculations') |
Is myFunction
still pure? No, yet it was hard to foresee this due to the JavaScript's dynamic
nature.
Simple purity testing
Let us leave the questions about consistency and functions using other functions aside and just see how to test if a simple function is pure or not. I showed bunch of tests before, but those relied on refactoring the source to always export lists of functions for simple replacement. Here is how to test a function in a slightly simpler way by just rewriting its source to run in a very isolated context.
Let us take a simple purity test: a function should be isolated from its context (lexical), thus
not use any outside variable. We take the add
function as the test subject:
1 | function add(a, b) { return a + b } |
Let us see how we can confirm the properties 2 and 3 - function add
does not leave any traces
in the environment, and it does not use any outside state. We need to isolate the function from
anything around it. Nodejs already has a pretty good isolation mechanism that we can use
to execute a piece of JavaScript without leaving traces: the
vm.runInContext methods.
1 | function add(a, b) { return a + b } |
In this example we took function add
but instead of running it directly, we created a function
inside a new context "sandbox". When function sandbox.add
executes, it only has access to its
"sandbox" object (the "sandbox" becomes "global" while the function runs). For example, if the
function leaves traces, they become properties on the "sandbox" object
1 | const vm = require('vm') |
We can make one more step and limit even this already minimal access. Let us wrap the function
add.toString()
in a closure with strict mode turned on. This will force "this" context to be
undefined, preventing leaving any traces even inside the "sandbox" context.
First, let us wrap a given function 'add' in a closure in strict mode
1 | function add(a, b) { return a + b } |
If we run it, limitedAdd
is a simple source string
1 | add = (function (){ |
Next, we can call it inside the "sandbox"
1 | const vm = require('vm') |
Now let us try using and polluting lexical / "sandbox" environment from inside add
1 | var K = 0 |
which prints
1 | add = (function (){ |
and if we actually try to run it gives us an error
1 | const vm = require('vm') |
1 | evalmachine.<anonymous>:4 |
Excellent! We have been able to take a function and separate it from its lexical (source) environment. This proves that we can run the function and it cannot affect or use the global (outside) state.
Why not use the simple 'eval' to achieve the same separation? Mostly because it is easier to reuse the context and to pass additional properties. For example, we might want to use the standard JavaScript facilities inside the "sandbox", like "console.log". By default this is impossible.
1 | var K = 0 |
1 | evalmachine.<anonymous>:4 |
But we can just pass reference to the console
object to the "sandbox"!
1 | const sandbox = { |
1 | adding 2 and 3 |
Using the "sandbox" allows us to keep adding desired properties in a very controlled manner, much
safer and more convenient than using the eval
.
Rewrite function's inner code
Instead of replacing the entire function with a sandbox, let us replace the code block inside the function and isolate it from the environment. This will limit the access to the lexical scope
1 | const K = 10 |
We can easily rewrite the context using an abstract tree parser like falafel. Let us see it in action. Given a filename, it can parse the source and then allows us to find every function declaration and rewrite it. We need both the block inside the function and the names of the function's arguments.
1 | // add.js |
This prints the expected values
1 | $ node rewrite.js |
Now let us put the block statement source inside an isolated "sandbox", but adding "a" and "b"
too. Our goal is to replace the inside of the function with the following code. (I removed
the console statement, leaving only the function add(a, b) { return a + b }
code)
1 | function add(a, b) { |
Note the original function block is now a string src
with "use strict" enabled. Only
the small portion return a + b
is actually the original code from the 'add' function.
Once we have wrapped the function's code, we need to create the sandbox, set the arguments
from the function's signature on the sandbox
and return the result of executing the src
inside the sandbox. The entire rewrite
algorithm looks like this
1 | const falafel = require('falafel') |
The source file add.js
for example gets transformed into the following (reindented for clarity)
1 | function add(a, b) { |
1 | function add(a, b) { |
Note the multi line string literal for simplicity, assigned to the variable "src".
The add-test.js
still works the same.
1 | $ node |
This is working exactly as expected - the console
object is not in the "sandbox" context,
thus it cannot be used, just like any other variable NOT inside the original add
function.
Thus the function add
is not pure. Let us make it pure and try again.
1 | function add(a, b) { |
1 | $ node rewrite.js |
Great, the inside of the function has been isolated from anything outside. If the function is pure, it will keep on working. If the function is not pure, it will raise an exception.
I have placed the rewriting code in the bahmutov/pure-inside with this example.
We can even automate the rewrite using a Node require hook, for example see projects node-hook and really-need.
Is the function consistent?
We know that by automated rewriting that the function follows rules #2 and #3 - no polluting the environment and no reading from the environment. But what about the consistency? This is much harder to test or prove. Consider a function that returns the current day of the month. It will return the same number for 23 hours, 59 minutes and 59 seconds. Then, all of the sudden, it will return something else! In my opinion, trying to prove the same return value is equivalent to proving the Halting problem - trying to prove that the output changes is the same as trying to prove that the program finishes writing. Thus it is theoretically undecidable.
What about practically decidable - meaning, can we test the function a few times and make a decision? I think so. Previously, I have written Rocha - a BDD test runner similar to Mocha but with test randomization on each run. To test if a function is consistent, I would need to write "RochaN" where the order of tests is randomized and then each test / function is run a random number N times and the result should be the same. Might not catch the "day of month" problem right away, but eventually it will appear one midnight!
Conclusion
Testing if a function is pure is difficult in JavaScript, especially if we take the more relaxed definition and allow functions to use other pure functions in its lexical scope. Notice that my source rewriting does NOT allow using other pure functions, since everything is isolated. The method would need the extension to add functions that were covered by the unit tests and passed to the "sandbox" environment of other functions and tests run again.
1 | // we can test add using the rewriting |
In general, I think the rule #1 is the most important - the function returning the same result given the same set of inputs is the most important property of pure functions. Relaxing the rules and allowing accessing other pure functions or constant values is fine, as long as the function is still consistent.
If a function is pure, then we can replace every call to the function with its body, slowly
unrolling the entire program. In the above example, once we have tested the code and determined
that add
is pure, we can replace the reference to add(a, -b)
with inside of add
.
1 | function add(a, b) { return a + b } // pure |
If we run the purity test again, sub
is now pure!
Noteworthy
Other people are also really excited about (really) pure functions, for example
Mykola Bilokonsky wants to
mark them
in code as isolated
to allow copy/paste of functions for testing.