May 19 2014

Keeping async data non-shared in singletons

Experiments with thread-local storage in Node.

It is easy to accidentally overwrite data in your program, especially when the program transitions to multiple requests executing in parallel. Node is great at async, event-driven programming using single event loop, but even when using node there might be some unexpected results.

Here is a shared variable example that is common when using a singleton middleware responding to multiple requests.

step 1 - sync

Let us start with simple initial code that works as expected: logs a field from a request object

function server(req) {
  console.log(req.name);
}
server({ name: 'foo' });
server({ name: 'bar' });
server({ name: 'baz' });
// output
foo
bar
baz

This is simple sync sequence of steps.

step 2 - async log

Instead of writing the request name, let us queue it up for logging later. This might be necessary to handle the request as quick as possible, postponing logging until the server is less busy.

function server(req) {
  process.nextTick(function () {
    console.log(req.name); // 1
  });
}
server({ name: 'foo' }); // 2
server({ name: 'bar' }); // 3
server({ name: 'baz' }); // 4
console.log('finished requests');
// output
finished requests
foo
bar
baz

I added finished requests message to show the sequence of events. There are no problems caused by the changed sequence of events. In particular the req.name property has the expected value, because each execution of the callback function has its own variable req on the stack, pointing at a different copy of the argument object to the server call (lines // 2, // 3, // 4), which are allocated on the heap. I described the different in stack vs heap in this blog post.

step 3 - async logger singleton

Let us now move the logging feature into a separate object. It makes a great sense to use a singleton pattern, since all messages ultimately go into same console object (could be any message sink).

var logger = {
  data: null, // 1
  queue: function (msg) {
    this.data = msg;
    console.log('queue msg', this.data);
  },
  flush: function () {
    process.nextTick(function () {
      console.log(this.data);
    }.bind(this));
  }
};
function server(req) {
  logger.queue(req.name);
}
server({ name: 'foo' });
logger.flush();
server({ name: 'bar' });
logger.flush();
server({ name: 'baz' });
logger.flush();
console.log('finished requests');
// output
queue msg foo
queue msg bar
queue msg baz
finished requests
baz
baz
baz

We used single property data (line // 1) to temporarily hold the message between queue and flush calls. Thus the last value value written baz was called 3 times, eventhough we scheduled the first 2 flush calls before it. This is an example where object-oriented approach with objects containing inner state causes problems, while functional programming with passing values around (as in step 2)

step 4 - local storage

Obviously this is very contrived example, but what if we wanted to make it more realistic and schedule the printing to happen after the server handles the request? Or pass information between the middleware layers specific to a particular execution stack? We have 5 choices:

pass all values around as arguments. This is safe, but becomes very verbose.
store values as properties in the request object. It is passed around anyway, so we could just use it. I am against this method because I like keeping the request pristine.
implement a queue data structure inside the logger.
keep extra info in a singleton hashtable with some request property used to generate the hash. First middleware adds the hashtable record, last middleware deletes it.
use the robust implementation of the previous approach using continuation-local-storage.

Better world by better software

Gleb Bahmutov PhD

Our planet 🌏 is in danger

Act today: what you can do