I often thought the Node require function was limited. Many times I needed to manually bust cache
before loading a module (during unit testing for example). Recently I looked at how require could be
improved and wrote really-need. It has two types of cache control (before and after loading),
source transformation callback, exported object transformation callback and even passing additional arguments
to the module instead of using global properties or config function pattern.
1 | require = require('really-need'); |
You can use really-need when doing module replacements, mocking, code transformations and similar tasks, see usage examples.
How it was done
The really interesting thing about this project is how simple it was to hack around Node's require function.
Turns out require is not a system call, instead it is JavaScript function and a part of the
system modules included with the Node.
Every file loaded using require(name) becomes an instance of Module class,
you can find its source in module.js. In general, when you make a require(name) call,
the following things happen
- The name argument to
require(name)is mapped to the full filename usingModule._resolveFilename(name, this);method. - If
cache[fullName]exists, thencache[fullName].exportsis returned. This speeds up module loading, because a module can only be loaded once. You candelete cache[fullName]yourself before callingrequire(name)to bust cache. - Otherwise, the source from
fullNameis loaded, and the corresponding callback is run to preprocess the source, see Module.prototype.load. I have a separate project node-hook for setting up these hooks. - Finally, the transformed source is compiled (evaluated) and the
module.exportsvalue is returned to the user, see Module.prototype._compile.
So what is require function? I thought it was a property of the global object, but could not find
it there.
1 | console.log(global.require); |
Interestingly, if you start node REPL, there is global.require
$ node
> global.require
{ [Function: require]
resolve: [Function],
main: undefined,
extensions:
{ '.js': [Function],
'.json': [Function],
'.node': [Function: dlopen] },
registerExtension: [Function],
cache: {} }
So inside the file index.js where is the require coming from? Turns out, before compiling the index.js, the
Module.prototype._compile wraps it in an IIFE. You can see (and even overwrite) the wrapper code
1 | var Module = require('module'); |
Each CommonJS module's source executes inside a function that gets its own require as an argument.
The actual require is prepared inside Module.prototype._compile (line // 1)
1 | Module.prototype._compile = function(content, filename) { |
When writing really-need I wanted to be able to pass an options object in addition to the name to be loaded
1 | var foo = require('./foo', { bust: true }); |
I started with modifying Module.prototype.require and added the second argument.
1 | var Module = require('module'); |
But here is the problem: if I load another module, then the second module still
gets the unary require through the IIFE
1 | // foo.js |
When index.js loads foo.js, and foo.js tries to load bar.js, it again uses the original unary require.
In essence, Module.prototype._compile tries to evaluate the following JavaScript
1 | // comes from _compile |
The second options argument in line // 2 is ignored, because the require from _compile is used (// 1).
How do we hack the Module.prototype._compile to overwrite the prepared function require
to pass the options argument?
Using eval and fake lexical scope!
We can replace Module.prototype._compile with the following code that is almost like the original _compile,
and is in fact 99% the original code.
1 | // really-need index.js |
We grab the full _compile source code using Module.prototype._compile.toString(); and replace unary
require call with apply(self, arguments);. The source transformation is simple enough to avoid
using an abstract syntax tree. Then we evaluate the patched source code and get the patchedCompile function
in line // 2. We create our own Module.prototype._compile that makes a call to the patched version.
There are more things in the full source, so we keep the call explicit.
When I tried to eval patched source, the interpreter crashed. The original _compile uses 3 variables
via lexical scope that are missing in my index.js. They were simple to add to the lexical scope
1 | // really-need index.js |
When eval tries to evaluate the _compileStr and finds a reference to the path for example, it
looks at the lexical scope where eval is located and finds var path = require('path');, thus
everything is peachy. I call it faking lexical scope, because I could easily substitute my own
object for true path module.
Why do we need the left-hand assignment in the original call require = require('really-need'); then?
Because I cannot overwrite the original variable require in the calling code. If I did not need to make
any more require calls after loading really-need, I could have omitted the assignment.