Apr 7 2016

Test if a function is pure revisited

Test function for purity using isolated v8 execution context.

I have played with pure JavaScript functions in the previous blog post Test if a function is pure. I know that pure functions are simpler to test and use in my code, but telling if a particular given function is pure is NOT easy. Let us look at a few examples where it is hard to tell if a function is pure or not, and then I will show how to run a given function in a very controlled fashion, completely isolation from its lexical (written) environment to test if it is practically pure.

Definition of a pure function

Eric Elliott in a blog post gives the following definition of pure functions

Given the same input, will always return the same output.
Produces no side effects.
Relies on no external state.

I can explain each property by giving a counter example

1: Pure function should return the same output given same inputs, unlike this function

1	function random() { return Math.random() }

2: Pure function produces no side effects (leaves the environment unaffected), unlike this function

function add(a, b) {
    global.something = 'added'
    return a + b
}

Common side effects are changing properties on the global object, writing to the console, changing the HTML of the page the script is executed on.

3: Pure function should not rely on the external state, unlike this function

1 2	const K = 10 function add10(a) { return a + K }

update

After this blog post went live, a lot of people objected to the example above. Most people agree that var K = 10 makes add10 impure, but the const K can be safely substituted into its use inside add10 and thus keeps the function pure. I agree with allowing using external constants and pure functions.

We have these simple three properties, but applying them in practice is anything but simple.

Pure or not?

I have been adding more examples to the list asking if each example is pure or not. Seems even humans sometimes have trouble drawing a line. For example, everyone agrees the following example is pure

1	function sum(a, b) { return a + b }

and this example is NOT

1 2	const k = 2 function addK(a) { return a + k }

Yet, practically speaking, programmers consider a function pure if it only uses other pure functions without passing them as arguments. In the code below, both sum and sub are pure by the human consensus.

1 2	function sum(a, b) { return a + b } function sub(a, b) { return sum(a, -b) }

If one considers sub NOT pure, because it is breaking rule #3, then a pure function is only allowed to use functions passed inside via arguments.

1
2
3

function sum(a, b) { return a + b }
function sub(a, b, summer) { return summer(a, -b) }
sub(10, 2, sum) // 8

The above example then shows several things

Programming with pure functions is a pain because we loose lexical scope (all the source around the function), and have to drag huge number of functions around.
A function can remain pure but use impure functions as arguments.

More on the later point. We can mark each line to better see the "purity"

1
2
3

function sum(a, b) { return a + b } // definitely pure
function sub(a, b, summer) { return summer(a, -b) } // pure?
sub(10, 2, sum) // not pure

The sub(10, 2, sum) source line is not pure because the result has to go somewhere, and in this case, if we just execute it in the Node environment, it is printed in the console (which we cannot do, breaks rule #2).

The above example shows that the same program can have parts that are pure and parts that have side effects. One of our goals when refactoring is to increase the first part, shrinking the second one. Some techniques help with this, for example using immutable data library or isolating the environment (Hi Cycle.js!)

If a pure function executes a function passed as the input argument, does the input function have to be pure? Seems so, otherwise the "pure" function can break rule #1. Imagine

1
2
3

sub(10, 2, Math.random) // 8.59
sub(10, 2, Math.random) // 8.01
sub(10, 2, Math.random) // something else

It is not enough to follow the above 3 rules then. A pure function better be passed pure functions as arguments; the rule #1 (consistency) seems to be stronger requirement than rule #3 (only using the input arguments)

Breaking the rules?

We can make the above example a little more contrived again. What about a pure function returning non-pure function?

function sumK(a, b) {
  const K = 10
  return function inner() {
    return a + b + K
  }
}

The function inner is not pure - it is breaking rule #3. Function sumK seems pure to me - it is always producing the same effect, is not changing the environment and does not rely on the external state.

sumK(2, 3)() // 15
sumK(2, 3)() // 15
sumK(2, 3)() // 15
// same result until the end of times

Let us compare the two examples side by side. First, we have a function that only uses its environment, but does NOT know if the function passed inside is pure or not.

1 2	function sub(a, b, summer) { return summer(a, -b) } // pure? sub(10, 2, fn) // hmm, is fn pure?

Second, we have a function that uses outside state, yet ALWAYS produces the same output

const K = 10
function addK(a, b) {
  return a + b + K
}

This is a dilemma - do I rather code passing functions around to avoid any function relying on the external state, making the decision about which particular function is pure, or do I use a simple constant external state that always produces the same result?

In fact, I would argue that using const keyword is preferred, because we can replace each use of a constant (unless it is an expression involving other variables) inside a function with its value to get obviously pure function

const K = 10
function addK(a, b) {
  return a + b + K
}
// replace K inside addK with simple value
function addK(a, b) {
  return a + b + 10
}

The simple replacement goes back to functions using external functions (not functional expressions). Most programmers consider both functions pure when declared like this

1 2	function sum(a, b) { return a + b } function sub(a, b) { return sum(a, -b) }

but not when declared as functional expressions

1 2	var sum = function sum(a, b) { return a + b } function sub(a, b) { return sum(a, -b) }

The main problem with using functional expressions above is that we are NOT using a function, instead we are using variable sum to call function sum. Thus we are breaking rule #3 - using outside variable, and that variable is not constant.

var sum = function sum(a, b) { return a + b }
function sub(a, b) { return sum(a, -b) }
sum = // some other function
sub(10, 2) // who knows!

The distinction between functions and the variables pointing at them is important. In JavaScript functions can be passed around, which means unless two functions are declared in the same file, they will be passed around, even via require('./sum') in which case all bets on consistency are off! It is really hard to lock down loaded modules in Node to avoid someone changing the code under your feet.

const calc = require('./common-calculations')
function myFunction(a, b) {
    return calc.sum(a, b) // seems pure and consistent
}
// somewhere else
calc.sum = function (a, b) { return a + b + Math.random() }

Is myFunction still pure? No, yet it was hard to foresee this due to the JavaScript's dynamic nature.

Simple purity testing

Let us leave the questions about consistency and functions using other functions aside and just see how to test if a simple function is pure or not. I showed bunch of tests before, but those relied on refactoring the source to always export lists of functions for simple replacement. Here is how to test a function in a slightly simpler way by just rewriting its source to run in a very isolated context.

Let us take a simple purity test: a function should be isolated from its context (lexical), thus not use any outside variable. We take the add function as the test subject:

1
2
3

function add(a, b) { return a + b }
console.log('2 + 5 =', add(2, 5))
// 2 + 5 = 7

Let us see how we can confirm the properties 2 and 3 - function add does not leave any traces in the environment, and it does not use any outside state. We need to isolate the function from anything around it. Nodejs already has a pretty good isolation mechanism that we can use to execute a piece of JavaScript without leaving traces: the vm.runInContext methods.

function add(a, b) { return a + b }
const vm = require('vm')
const sandbox = {}
vm.createContext(sandbox)
vm.runInContext(add.toString(), sandbox)
console.log(sandbox)
// { add: [Function: add] }
console.log(sandbox.add(2, 3))

In this example we took function add but instead of running it directly, we created a function inside a new context "sandbox". When function sandbox.add executes, it only has access to its "sandbox" object (the "sandbox" becomes "global" while the function runs). For example, if the function leaves traces, they become properties on the "sandbox" object

const vm = require('vm')
const sandbox = {}
vm.createContext(sandbox)
vm.runInContext(add.toString(), sandbox)
var K = 0
function add(a, b) {
  K = -1
  return a + b
}
console.log(sandbox)
// { add: [Function: add] }
console.log(sandbox.add(2, 3))
// 5
console.log(K)
// 0
console.log(sandbox)
// { add: [Function: add], K: -1 }

We can make one more step and limit even this already minimal access. Let us wrap the function add.toString() in a closure with strict mode turned on. This will force "this" context to be undefined, preventing leaving any traces even inside the "sandbox" context.

First, let us wrap a given function 'add' in a closure in strict mode

function add(a, b) { return a + b }
const pre = 'add = (function (){\n"use strict"\n'
const post = '\n}())'
const limitedAdd = pre + 'return ' + add.toString() + post
console.log(limitedAdd)

If we run it, limitedAdd is a simple source string

add = (function (){
"use strict"
return function add(a, b) { return a + b }
}())

Next, we can call it inside the "sandbox"

const vm = require('vm')
const sandbox = {}
vm.createContext(sandbox)
// limitedAdd from above
vm.runInContext(limitedAdd, sandbox)
function add(a, b) { return a + b }
console.log(sandbox.add(2, 3))
// 5

Now let us try using and polluting lexical / "sandbox" environment from inside add

var K = 0
function add(a, b) {
  K = -1
  return a + b
}
const pre = 'add = (function (){\n"use strict"\n'
const post = '\n}())'
const limitedAdd = pre + 'return ' + add.toString() + post
console.log(limitedAdd)

which prints

add = (function (){
"use strict"
return function add(a, b) {
  K = -1
  return a + b
}
}())

and if we actually try to run it gives us an error

const vm = require('vm')
const sandbox = {}
vm.createContext(sandbox)
// limitedAdd from above
vm.runInContext(limitedAdd, sandbox)
console.log(sandbox.add(2, 3))

evalmachine.<anonymous>:4
  K = -1
    ^
ReferenceError: K is not defined
    at Object.add (evalmachine.<anonymous>:4:5)

Excellent! We have been able to take a function and separate it from its lexical (source) environment. This proves that we can run the function and it cannot affect or use the global (outside) state.

Why not use the simple 'eval' to achieve the same separation? Mostly because it is easier to reuse the context and to pass additional properties. For example, we might want to use the standard JavaScript facilities inside the "sandbox", like "console.log". By default this is impossible.

var K = 0
function add(a, b) {
  console.log('adding', a, 'and', b)
  K = -1
  return a + b
}
// the rest of the above example

evalmachine.<anonymous>:4
  console.log('adding', a, 'and', b)
  ^
ReferenceError: console is not defined

But we can just pass reference to the console object to the "sandbox"!

const sandbox = {
  console: console
}
vm.createContext(sandbox)
vm.runInContext(limitedAdd, sandbox)
console.log(sandbox.add(2, 3))

adding 2 and 3
evalmachine.<anonymous>:5
  K = -1
    ^
ReferenceError: K is not defined

Using the "sandbox" allows us to keep adding desired properties in a very controlled manner, much safer and more convenient than using the eval.

Rewrite function's inner code

Instead of replacing the entire function with a sandbox, let us replace the code block inside the function and isolate it from the environment. This will limit the access to the lexical scope

const K = 10
function add(a, b) { return a + b + K}
// to
const K = 10
function add(a, b) {
    // create sandbox vm context
    // evaluate "return a + b + K" inside the sandbox
    // return the evaluated result
}

We can easily rewrite the context using an abstract tree parser like falafel. Let us see it in action. Given a filename, it can parse the source and then allows us to find every function declaration and rewrite it. We need both the block inside the function and the names of the function's arguments.

// add.js
function add(a, b) {
  console.log('adding', a, 'and', b)
  return a + b
}
module.exports = add
// rewrite.js
const falafel = require('falafel')
const fs = require('fs')
const source = fs.readFileSync('./add.js')
const output = falafel(source, function (node) {
  if (node.type === 'BlockStatement' && node.parent.type === 'FunctionDeclaration') {
    console.log(node.type, node.source())
    console.log('parent vars', node.parent.params.map((node) => node.name))
  }
});

This prints the expected values

$ node rewrite.js 
BlockStatement {
  console.log('adding', a, 'and', b)
  return a + b
}
parent vars [ 'a', 'b' ]

Now let us put the block statement source inside an isolated "sandbox", but adding "a" and "b" too. Our goal is to replace the inside of the function with the following code. (I removed the console statement, leaving only the function add(a, b) { return a + b } code)

function add(a, b) {
  const vm = require('vm')
  const sandbox = {}
  vm.createContext(sandbox)
  const src = '(function (){\n"use strict"\nreturn (function (){\nreturn a + b\n}())}())'
  sandbox.a = a
  sandbox.b = b
  return vm.runInContext(src, sandbox)
}
module.exports = add

Note the original function block is now a string src with "use strict" enabled. Only the small portion return a + b is actually the original code from the 'add' function. Once we have wrapped the function's code, we need to create the sandbox, set the arguments from the function's signature on the sandbox and return the result of executing the src inside the sandbox. The entire rewrite algorithm looks like this

const falafel = require('falafel')
const fs = require('fs')
const source = fs.readFileSync('./add.js', 'utf8')
const output = falafel(source, function (node) {
  if (node.type === 'BlockStatement' && node.parent.type === 'FunctionDeclaration') {
    const vars = node.parent.params.map((node) => node.name)
    // wrap function in 'use strict' closure
    const pre = '`(function (){\n"use strict"\nreturn (function () '
    const post = '\n())}())`'
    const limitedBlock = pre + node.source() + post
    // wrap in VM context
    const preVm = 'const vm = require("vm")\nconst sandbox = {}\nvm.createContext(sandbox)\n const src = '
    // add all arguments to the sandbox
    var postVm = ''
    vars.forEach((name) => {
      postVm += '\nsandbox.' + name + ' = ' + name
    })
    postVm += '\nreturn vm.runInContext(src, sandbox)\n'
    const innerCode = preVm + limitedBlock + postVm
    node.update('{\n' + innerCode + '\n}')
  }
});
fs.writeFileSync('./add-test.js', output, 'utf8')

The source file add.js for example gets transformed into the following (reindented for clarity)

add.js

function add(a, b) {
  console.log('adding', a, 'and', b)
  return a + b
}
module.exports = add

add-test.js

function add(a, b) {
  const vm = require("vm")
  const sandbox = {}
  vm.createContext(sandbox)
  const src = `(function (){
    "use strict"
    return (function () {
      console.log('adding', a, 'and', b)
      return a + b
    }
    ())}())`
  sandbox.a = a
  sandbox.b = b
  return vm.runInContext(src, sandbox)
}
module.exports = add

Note the multi line string literal for simplicity, assigned to the variable "src". The add-test.js still works the same.

$ node
> var add = require('./add-test.js')
undefined
> add(2, 3)
ReferenceError: console is not defined

This is working exactly as expected - the console object is not in the "sandbox" context, thus it cannot be used, just like any other variable NOT inside the original add function. Thus the function add is not pure. Let us make it pure and try again.

function add(a, b) {
  return a + b
}
module.exports = add

$ node rewrite.js 
$ node
> var add = require('./add-test.js')
undefined
> add(2, 3)
5

Great, the inside of the function has been isolated from anything outside. If the function is pure, it will keep on working. If the function is not pure, it will raise an exception.

I have placed the rewriting code in the bahmutov/pure-inside with this example.

We can even automate the rewrite using a Node require hook, for example see projects node-hook and really-need.

Is the function consistent?

We know that by automated rewriting that the function follows rules #2 and #3 - no polluting the environment and no reading from the environment. But what about the consistency? This is much harder to test or prove. Consider a function that returns the current day of the month. It will return the same number for 23 hours, 59 minutes and 59 seconds. Then, all of the sudden, it will return something else! In my opinion, trying to prove the same return value is equivalent to proving the Halting problem - trying to prove that the output changes is the same as trying to prove that the program finishes writing. Thus it is theoretically undecidable.

What about practically decidable - meaning, can we test the function a few times and make a decision? I think so. Previously, I have written Rocha - a BDD test runner similar to Mocha but with test randomization on each run. To test if a function is consistent, I would need to write "RochaN" where the order of tests is randomized and then each test / function is run a random number N times and the result should be the same. Might not catch the "day of month" problem right away, but eventually it will appear one midnight!

Conclusion

Testing if a function is pure is difficult in JavaScript, especially if we take the more relaxed definition and allow functions to use other pure functions in its lexical scope. Notice that my source rewriting does NOT allow using other pure functions, since everything is isolated. The method would need the extension to add functions that were covered by the unit tests and passed to the "sandbox" environment of other functions and tests run again.

// we can test add using the rewriting
function add(a, b) { return a + b }
// we do not import "add" to the sandbox,
// thus our rewriting will say "sub" is not pure
function sub(a, b) { return add(a, -b) }

In general, I think the rule #1 is the most important - the function returning the same result given the same set of inputs is the most important property of pure functions. Relaxing the rules and allowing accessing other pure functions or constant values is fine, as long as the function is still consistent.

If a function is pure, then we can replace every call to the function with its body, slowly unrolling the entire program. In the above example, once we have tested the code and determined that add is pure, we can replace the reference to add(a, -b) with inside of add.

1 2	function add(a, b) { return a + b } // pure function sub(a, b) { return (function (a, b){ return a + b })(a, -b) }

If we run the purity test again, sub is now pure!

Noteworthy

Other people are also really excited about (really) pure functions, for example Mykola Bilokonsky wants to mark them in code as isolated to allow copy/paste of functions for testing.

Better world by better software

Gleb Bahmutov PhD

Our planet 🌏 is in danger

Act today: what you can do