Concurrency can bite you even in Node

Ordering of execution can be tricky in the JavaScript event loop.

Last night I had a bug due to concurrency in Nodejs.

Wait, what? Nodejs was supposed to be the single-threaded nirvana, where all the problems common to the multi-threaded apps were gone!

No quite. There is still a problem with shared data accessed by separate execution stacks. Let me explain with examples.

Suppose we need to add 100 items to an array and then print the number of items in the array (should be 100) and then repeat.

First: the correct sync code that behaves as expected

1
2
3
4
5
6
7
8
9
10
11
var n = 100;
var items;
function addItems() {
items = []; // 1
for (var k = 0; k < n; k += 1) {
items.push(k);
}
return items.length;
}
console.log('sync 1', addItems()); // 2
console.log('sync 2', addItems()); // 3
// prints
sync 1 100
sync 2 100

Call to addItems // 2 clears the items array (// 1), then adds each item. Second call addItems // 3 waits until the addItems finishes, then starts. items is initialized outside addItem but that poses no threat. Nodejs is good with module's default scope, so for all purposes the variable items is local.

Second: lets simulate event-driven processing by adding each item via scheduling code to run later. This is closer to normal asynchronous event processing common to an http server responding to requests for example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
var items;
function addItems() {
items = []; // 1
function add(j) {
items.push(j);
if (j === n - 1) {
console.log('async', items.length); // 2
}
}
for (var k = 0; k < n; k += 1) {
process.nextTick(add.bind(null, k)); // 3
}
}
addItems(); // 4
addItems(); // 5
// prints
async 100
async 200

The output is unexpected, the second result is wrong! Did it not initialize the items array properly? Yes it did, but the execution order here is not what you would expect. Instead it goes something like this:

  • variable items is created in the first line
  • first call to addItems // 4 executes
  • items is set to empty array in // 1
  • add(0) code block is placed onto the event queue by // 3
  • add(1) code block is placed onto the event queue by // 3
    • same for add(2), ..., add(99)
  • The for loop completes. The current stack continues to run! It exits addItems and get to next addItems //5
  • items (still empty) is set to empty array again in // 1
  • Another 100 calls to add are placed on the event queue, making 200 code blocks there.
  • addItems returns and the main block exits
  • first 100 add code blocks execute before printing first items.length in // 2
  • second 100 add code blocks execute
  • unexpected 200 is printed in // 2 because there was no items reset.

Fix: make items variable private to each execution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
function addItems() {
var items = []; // 1
function add(j) {
items.push(j);
if (j === n - 1) {
console.log('async', items.length); // 2
}
}
for (var k = 0; k < n; k += 1) {
process.nextTick(add.bind(null, k)); // 3
}
}
addItems(); // 4
addItems(); // 5
async 100
async 100  // excellent!

By making items in // 1 private to addItems we create separate copy for each call of addItems. First 100 calls to add thus are bound to first copy, while the second 100 are bound to the second copy.

Third: lets write async code using promises

I use my favorite Q library, but feel free to use any implementation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
var items;
function addItems() {
items = []; // 1
function add(j) {
items.push(j);
return items.length; // 2
}
var chain = Q();
for (var k = 0; k < n; k += 1) {
chain = chain.then(add.bind(null, k)); // 3
}
return chain;
}
addItems().then(console.log); // 4
addItems().then(console.log); // 5
// prints
199
200

Now this result is even weirder than async example!

Here is what happens:

The two promise chains take turns, each executing one link at a time and then scheduling next link onto the event queue. Classic concurrency, except each code block cannot be preempted in the middle by design.

  • a chain of 100 promises is constructed by addItems() in // 4
    • the first link is placed onto the event queue
    • the last link will return the length of items array
  • a chain of 100 promises is constructed by addItems() in // 5
    • the first link is placed onto the event queue
    • the last link will return the length of items array
  • the chain does not start execution until the current execution stack exits
  • first link of the first chain executes, then adds second link of the first chain to the even queue. Then it exits.
  • first link of the second chain executes, adding its second link to the event queue.
  • second link of the first chain executes, etc.
  • last link (number 100) of the first chain executes. By this time 99 links of the second chain have finished, adding the extra 99 items to the array. Thus first console.log prints 199.
  • last link of the second chain finishes, printing 200.

The solution again is to move the items variable into the addItems function, making two separate copies for two promise chains.

Conclusion

Nodejs makes event-driven programming so easy, that sometimes it hides its turn by turn event queue nature. When starting with sync code, and then changing to async (using promises for example), it is easy to overlook the changed access order to shared data.

My advice to avoid running into this problem is too keep data in as small of scope as possible, minimizing chances that separate interior code blocks having access are going to run out of expected order.