Jan 4 2015

Hacking Node require

Replace NodeJS require to add cache busting, pre- and post-processing, mocking, arguments.

I often thought the Node require function was limited. Many times I needed to manually bust cache before loading a module (during unit testing for example). Recently I looked at how require could be improved and wrote really-need. It has two types of cache control (before and after loading), source transformation callback, exported object transformation callback and even passing additional arguments to the module instead of using global properties or config function pattern.

require = require('really-need');
// require(name, options)
require('./foo', {
    bust: true, // load foo.js even if previously cached
    keep: false, // remove loaded value from cache
    pre: function (source, filename) {
        // transform source
        return source;
    },
    post: function (exported, filename) {
        // modify exported value
        return exported;
    },
    args: {
        __dirname: '/diff/path',
        require: function (name) {
            console.log('foo is trying to load', name);
            return require(name);
        },
        // or even custom variables instead of globals
        a: 10
    }
});

You can use really-need when doing module replacements, mocking, code transformations and similar tasks, see usage examples.

How it was done

The really interesting thing about this project is how simple it was to hack around Node's require function. Turns out require is not a system call, instead it is JavaScript function and a part of the system modules included with the Node. Every file loaded using require(name) becomes an instance of Module class, you can find its source in module.js. In general, when you make a require(name) call, the following things happen

The name argument to require(name) is mapped to the full filename using Module._resolveFilename(name, this); method.
If cache[fullName] exists, then cache[fullName].exports is returned. This speeds up module loading, because a module can only be loaded once. You can delete cache[fullName] yourself before calling require(name) to bust cache.
Otherwise, the source from fullName is loaded, and the corresponding callback is run to preprocess the source, see Module.prototype.load. I have a separate project node-hook for setting up these hooks.
Finally, the transformed source is compiled (evaluated) and the module.exports value is returned to the user, see Module.prototype._compile.

So what is require function? I thought it was a property of the global object, but could not find it there.

1 2	console.log(global.require); // undefined

Interestingly, if you start node REPL, there is global.require

$ node
> global.require
{ [Function: require]
  resolve: [Function],
  main: undefined,
  extensions:
   { '.js': [Function],
     '.json': [Function],
     '.node': [Function: dlopen] },
  registerExtension: [Function],
  cache: {} }

So inside the file index.js where is the require coming from? Turns out, before compiling the index.js, the Module.prototype._compile wraps it in an IIFE. You can see (and even overwrite) the wrapper code

var Module = require('module');
console.log(Module.wrapper);
[ '(function (exports, require, module, __filename, __dirname) { ',
  '\n});' ]

Each CommonJS module's source executes inside a function that gets its own require as an argument. The actual require is prepared inside Module.prototype._compile (line // 1)

Module.prototype._compile = function(content, filename) {
  var self = this;
  function require(path) { // 1
    return self.require(path);
  }
  ...
  var wrapper = Module.wrap(content);
  var compiledWrapper = runInThisContext(wrapper, { filename: filename });
  var args = [self.exports, require, self, filename, dirname];
  return compiledWrapper.apply(self.exports, args);
};

When writing really-need I wanted to be able to pass an options object in addition to the name to be loaded

1	var foo = require('./foo', { bust: true });

I started with modifying Module.prototype.require and added the second argument.

var Module = require('module');
var _require = Module.prototype.require;
Module.prototype.require = function reallyNeedRequire(name, options) {
    options = options || {};
    var nameToLoad = Module._resolveFilename(name, this);
    if (options.bust) {
        delete require.cache[nameToLoad];
    }
    return _require.call(this, nameToLoad);
};

But here is the problem: if I load another module, then the second module still gets the unary require through the IIFE

// foo.js
require('/bar', { bust: true });
// index.js
require = require('really-need');
// require is now our binary function
require('./foo', { bust: true });

When index.js loads foo.js, and foo.js tries to load bar.js, it again uses the original unary require. In essence, Module.prototype._compile tries to evaluate the following JavaScript

// comes from _compile
var self = this;
function require(path) { // 1
  return self.require(path);
}
(function (require, ...) {
    // foo.js
    require('/bar', { bust: true }); // 2
});

The second options argument in line // 2 is ignored, because the require from _compile is used (// 1). How do we hack the Module.prototype._compile to overwrite the prepared function require to pass the options argument?

Using eval and fake lexical scope!

We can replace Module.prototype._compile with the following code that is almost like the original _compile, and is in fact 99% the original code.

// really-need index.js
var _compileStr = Module.prototype._compile.toString();
// pass all arguments from loaded module to our self.require
_compileStr = _compileStr.replace('self.require(path);',
  'self.require.apply(self, arguments);'); // 1
/* jshint -W061 */
var patchedCompile = eval('(' + _compileStr + ')'); // 2
Module.prototype._compile = function(content, filename) {
  return patchedCompile.call(this, content, filename);
};

We grab the full _compile source code using Module.prototype._compile.toString(); and replace unary require call with apply(self, arguments);. The source transformation is simple enough to avoid using an abstract syntax tree. Then we evaluate the patched source code and get the patchedCompile function in line // 2. We create our own Module.prototype._compile that makes a call to the patched version. There are more things in the full source, so we keep the call explicit.

When I tried to eval patched source, the interpreter crashed. The original _compile uses 3 variables via lexical scope that are missing in my index.js. They were simple to add to the lexical scope

// really-need index.js
// these variables are needed inside eval _compile
/* jshint -W098 */
var runInNewContext = require('vm').runInNewContext;
var runInThisContext = require('vm').runInThisContext;
var path = require('path');
...
var patchedCompile = eval('(' + _compileStr + ')'); // 2

When eval tries to evaluate the _compileStr and finds a reference to the path for example, it looks at the lexical scope where eval is located and finds var path = require('path');, thus everything is peachy. I call it faking lexical scope, because I could easily substitute my own object for true path module.

Why do we need the left-hand assignment in the original call require = require('really-need'); then? Because I cannot overwrite the original variable require in the calling code. If I did not need to make any more require calls after loading really-need, I could have omitted the assignment.

Better world by better software

Gleb Bahmutov PhD

Our planet 🌏 is in danger

Act today: what you can do

Hacking Node require

Replace NodeJS require to add cache busting, pre- and post-processing, mocking, arguments.

How it was done