Faster Node app require

Speed up Node.js application startup by caching require path resolutions.

An interesting observation came across my desk a few days ago: according to Node’s require is dog slow the Node require "hunts" for files to load when resolving 3rd party names. For example when you require("express") in your application source file, the Node require will try to load node_modules/express.js and will fail, then it will try to load node_modules/express.json and will fail, then it will try to load node_modules/express.node. Finally it will "give up" and will load node_modules/express/package.json to read the proper main filename. Only then it will read the node_modules/express/index.js from the disk!

You can see this for yourself if you profile our own Node application using the dtruss program (included with Mac OS). Just start the profiling from the first terminal

sudo dtruss -d -n 'node' > /tmp/require.log 2>&1

Then go to the second terminal window and start the application. For example I will load express, and that is it. Because require calls are synchronous I can simply time the call using high resolution timer

index.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
function time(fn) {
var t = process.hrtime();
var result = fn();
t = process.hrtime(t);
var nanoToMs = 1e-6;
console.log('benchmark took %d seconds and %d milliseconds',
t[0], Math.round(t[1] * nanoToMs));
console.log('benchmark function was');
console.log(fn.toString());
}
function load() {
var exp = require('express');
}
time(load);

Run this program and the load the /tmp/require.log in a text editor. The result shows lots of calls just to find the right source file to start loading express library!

# microseconds call
664730 stat64(".../test/node_modules/express\0", 0x7FFF5FBFECF8, 0x204)        = 0 0
664784 stat64(".../test/node_modules/express.js\0", 0x7FFF5FBFED28, 0x204)         = -1 Err#2
664834 stat64(".../test/node_modules/express.json\0", 0x7FFF5FBFED28, 0x204)       = -1 Err#2
664859 stat64(".../test/node_modules/express.node\0", 0x7FFF5FBFED28, 0x204)       = -1 Err#2
664969 open(".../test/node_modules/express/package.json\0", 0x0, 0x1B6)        = 11 0
664976 fstat64(0xB, 0x7FFF5FBFEC38, 0x1B6)         = 0 0
665022 read(0xB, "{\n  \"name\": \"express\", ...}", 0x103D)        = 4157 0
665030 close(0xB)      = 0 0

The first column shows the timestamp in microseconds. Each wasted file system call takes only 100 microseconds, but the tiny delays add up to hundreds of milliseconds and finally seconds for larger frameworks.

I will show the end-to-end timing results later.

Cache path resolution

Luckily we can easily hook into the Nodejs loader, overwrite the require calls and cache the resolved filenames. I wrote cache-require-paths that does this. The entire source is only generous 30 lines and here is the main gist: wrap Module.prototype.require and save the resolved filenames into an object on the first run. On the second run, if the name cache already has the resolution for given filename, load that module.

cache-require-paths
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
var Module = require('module');
var fs = require('fs');
var exists = fs.existsSync;
var _require = Module.prototype.require;
var nameCache = exists(SAVE_FILENAME) ? JSON.parse(fs.readFileSync(SAVE_FILENAME, 'utf-8')) : {};
Module.prototype.require = function cachePathsRequire(name) {
var pathToLoad;
var currentModuleCache = nameCache[this.filename];
if (!currentModuleCache) {
currentModuleCache = {};
nameCache[this.filename] = currentModuleCache;
}
if (currentModuleCache[name]) {
pathToLoad = currentModuleCache[name];
} else {
pathToLoad = Module._resolveFilename(name, this);
currentModuleCache[name] = pathToLoad;
}
return _require.call(this, pathToLoad);
};

One can simple load this as the first line in the application and get the cache benefits

npm install --save cache-require-paths
// first line of your app.js
require('cache-require-paths');

The cache mechanism avoids a lot of wasted file system calls (always slow!) and generates the following results for a couple of popular libraries.

Using node 0.10.37

require('X')    |  standard (ms)  |  with cache (ms)  |  speedup (%)
------------------------------------------------------------------
[email protected]  |        72       |       46          |     36
[email protected]   |       230       |      170          |     26
[email protected]     |       120       |       95          |     20
[email protected]    |       170       |      120          |     29

Using node 0.12.2 - all startup times became slower.

require('X')    |  standard (ms)  |  with cache (ms)  |  speedup (%)
------------------------------------------------------------------
[email protected]  |        90       |       55          |     38
[email protected]   |       250       |      200          |     20
[email protected]     |       150       |      120          |     20
[email protected]    |       200       |      145          |     27

Interesting, isn't it? A large startup performance boost just by using a single require!

Of course, I need to add cache invalidation, for example if the module's dependencies changed. Luckily this is simple to do: just look at the list of dependencies in the package.json file!