Pyramid of lexical scope Doom

How to avoid creeping lexical scope in asynchronous code.

Nested callbacks to manage asynchronous tasks are bad. They quickly form a "pyramid of doom", overwhelming the developer with the boilerplate code. If you try to handle the errors at each step - forget it, the pyramid of doom will win. There is one more side effect of using nested callbacks: the creeping enlargement of the lexical scope. Let us take a small example with a twist: add and print two numbers, but each operation (like getting a number, summation, printing) will be asynchronous. For example to get a number, we could use initially this function

get an async number
1
2
3
4
5
function getNumber(cb) {
setTimeout(function () {
cb(10);
}, 0);
}

Similarly we can add and print numbers using async functions

1
2
3
4
5
6
7
8
9
10
function add(a, b, cb) {
setTimeout(function () {
cb(a + b);
}, 0);
}
function print(n) {
setTimeout(function () {
console.log(n);
}, 0);
}

Let us use these 3 functions to add 10 to 10 and print the result

1
2
3
4
5
getNumber(function gotA(a) {
getNumber(function gotB(b) {
add(a, b, print);
});
});

When we run the above code we get the expected result 20. Notice that the function gotB is calling the add function, but where does it get a and b values? The value b is passed directly into gotB, but the value a is accessed via the lexical scope. Thus the function gotB is not a pure function. Non-pure small functions like gotB mixed with a complicated control flow is an excellent breeding ground for potential errors. Even if you understand the logic right now, someone who updates the code later can miss a detail, introducing hard to debug problem.

Break the lexical scope

The creeping lexical scope in the pyramid of doom is due to the fact that the inner function is written inside the outer callback. One can simply move the inner function outside - this minimizes the scope and leads to the simpler code.

the refactored code
1
2
3
4
5
6
7
function gotB(b) {
add(a, b, print); // ReferenceError
}
function gotA(a) {
getNumber(gotB);
}
getNumber(gotA);

Of course, the above code does not run: the variable a is invalid inside the stand alone gotB function. We have to explicitly pass all arguments to the gotB.

1
2
3
4
5
6
7
function gotB(a, b) {
add(a, b, print);
}
function gotA(a) {
getNumber(gotB.bind(null, a));
}
getNumber(gotA);

This code is a lot clearer, but we still hide the callbacks. For example, the reference print inside the gotB function is hidden from the outside world - we have no idea the result will be printed when calling getNumber(gotA). Thus I prefer to refactor the asynchronous code using promises - the top level algorithm can specify each step, without pushing the callbacks deep into the function.

Clear control flow using promises

I will use Q library to wrap all logic into promise-returning steps.

1
2
3
4
5
6
7
8
9
10
11
12
13
var Q = require('q');
function getNumber() {
return Q(10);
}
function add(a, b) {
return Q(a + b);
}
function print(n) {
return Q()
.then(function () {
console.log(n);
});
}

We the can our desired asynchronous steps without adding any dependencies among getNumber, add and print.

1
2
3
4
5
6
7
8
getNumber()
.then(function gotA(a) {
return getNumber()
.then(function gotB(b) {
return add(a, b);
});
})
.then(print);

Notice that the top level algorithm has separated print nicely, but still relies on the common lexical scope inside gotA and gotB to assemble both a and b before passing to the add function. Can we do better?

Yes, we can pass more than a single argument among promises by returning arrays and using the spread method.

1
2
3
4
5
6
7
8
getNumber()
.then(function gotA(a) {
return [a, getNumber()];
})
.spread(function (a, b) {
return add(a, b);
})
.then(print);

Notice that inside gotA we return both a primitive value a and a promise returned by the getNumber() call. Ordinarily this would return an array with a value and a promise

1
2
3
4
5
6
7
getNumber()
.then(function (a) {
return [a, getNumber()];
})
.then(function (r) {
// r = [a, Promise]
})

In our case we want to resolve all promises and then get the separate values as arguments, thus we use the .spread. It waits until everything is settled before calling apply on the function we provided as callback.

We can go one step further and remove the wrapper function that just passes values a and b to the add function and go point-free

1
2
3
4
5
6
getNumber()
.then(function (a) {
return [a, getNumber()];
})
.spread(add)
.then(print);

Beautiful, nesting-free code with pure steps. What can go wrong?