I often thought the Node require
function was limited. Many times I needed to manually bust cache
before loading a module (during unit testing for example). Recently I looked at how require
could be
improved and wrote really-need. It has two types of cache control (before and after loading),
source transformation callback, exported object transformation callback and even passing additional arguments
to the module instead of using global properties or config function pattern.
1 | require = require('really-need'); |
You can use really-need when doing module replacements, mocking, code transformations and similar tasks, see usage examples.
How it was done
The really interesting thing about this project is how simple it was to hack around Node's require
function.
Turns out require
is not a system call, instead it is JavaScript function and a part of the
system modules included with the Node.
Every file loaded using require(name)
becomes an instance of Module
class,
you can find its source in module.js. In general, when you make a require(name)
call,
the following things happen
- The name argument to
require(name)
is mapped to the full filename usingModule._resolveFilename(name, this);
method. - If
cache[fullName]
exists, thencache[fullName].exports
is returned. This speeds up module loading, because a module can only be loaded once. You candelete cache[fullName]
yourself before callingrequire(name)
to bust cache. - Otherwise, the source from
fullName
is loaded, and the corresponding callback is run to preprocess the source, see Module.prototype.load. I have a separate project node-hook for setting up these hooks. - Finally, the transformed source is compiled (evaluated) and the
module.exports
value is returned to the user, see Module.prototype._compile.
So what is require
function? I thought it was a property of the global
object, but could not find
it there.
1 | console.log(global.require); |
Interestingly, if you start node REPL, there is global.require
$ node
> global.require
{ [Function: require]
resolve: [Function],
main: undefined,
extensions:
{ '.js': [Function],
'.json': [Function],
'.node': [Function: dlopen] },
registerExtension: [Function],
cache: {} }
So inside the file index.js
where is the require
coming from? Turns out, before compiling the index.js
, the
Module.prototype._compile
wraps it in an IIFE. You can see (and even overwrite) the wrapper code
1 | var Module = require('module'); |
Each CommonJS module's source executes inside a function that gets its own require
as an argument.
The actual require
is prepared inside Module.prototype._compile
(line // 1
)
1 | Module.prototype._compile = function(content, filename) { |
When writing really-need I wanted to be able to pass an options object in addition to the name to be loaded
1 | var foo = require('./foo', { bust: true }); |
I started with modifying Module.prototype.require
and added the second argument.
1 | var Module = require('module'); |
But here is the problem: if I load another module, then the second module still
gets the unary require
through the IIFE
1 | // foo.js |
When index.js
loads foo.js
, and foo.js
tries to load bar.js
, it again uses the original unary require
.
In essence, Module.prototype._compile
tries to evaluate the following JavaScript
1 | // comes from _compile |
The second options argument in line // 2
is ignored, because the require
from _compile
is used (// 1
).
How do we hack the Module.prototype._compile to overwrite the prepared function require
to pass the options argument?
Using eval
and fake lexical scope!
We can replace Module.prototype._compile
with the following code that is almost like the original _compile
,
and is in fact 99% the original code.
1 | // really-need index.js |
We grab the full _compile
source code using Module.prototype._compile.toString();
and replace unary
require
call with apply(self, arguments);
. The source transformation is simple enough to avoid
using an abstract syntax tree. Then we evaluate the patched source code and get the patchedCompile
function
in line // 2
. We create our own Module.prototype._compile
that makes a call to the patched version.
There are more things in the full source, so we keep the call explicit.
When I tried to eval
patched source, the interpreter crashed. The original _compile
uses 3 variables
via lexical scope that are missing in my index.js
. They were simple to add to the lexical scope
1 | // really-need index.js |
When eval
tries to evaluate the _compileStr
and finds a reference to the path
for example, it
looks at the lexical scope where eval
is located and finds var path = require('path');
, thus
everything is peachy. I call it faking lexical scope, because I could easily substitute my own
object for true path
module.
Why do we need the left-hand assignment in the original call require = require('really-need');
then?
Because I cannot overwrite the original variable require
in the calling code. If I did not need to make
any more require
calls after loading really-need, I could have omitted the assignment.