The boundary between testing and production code should be taken down. We write production code to be run in the user's browser or at the server. We write testing code to only be run in controlled environment before the deployment. Yet this separation is leaky even today:
- Monitoring code - we often have 3rd party services hitting our website / API at periodic intervals. Is the code supporting these requests production or part of testing?
- Performance measurements - collecting and reporting page load times for example could be considered load testing, and not production code.
- Environment and system startup diagnostic code. I often detect certain browsers at the page load step and perform separate actions for some of the browsers.
All types of testing or monitoring are just different means to achieve one goal: the software should work. How we get there is up to us. Judging by the crappy websites, faulty games, hanging applications we see everywhere, the current methods are imperfect in achieving this goal. Let us think outside the box and compare the software to the electronic systems inside modern cars.
When you start a car, it goes through a series of self-diagnostic tests. If any component reports a failure, one of several dashboard lights will light up. Usually it is cryptic "Check Engine" light. During normal driving, the car constantly monitors itself via hundreds of sensors, storing any error codes in its internal computer, and maybe lighting up "Check Engine" light again.
Did the car go through unit and system tests at the factor? Yes of course. But it still has two features that mostly luck in software: start up self-diagnostic tests and constant internal monitoring. We might have some server-side resource and performance monitoring by using SDKs like New Relic, but this is very different from the production code, and is mostly a completely black box approach.
As far as I see, developers are not using unit tests at the start to diagnose problems. At most start up is used to load shims and polyfills for missing environment features. I take simple approach by first loading client-side error reporting library (like Sentry) and then checking if all expected JavaScript libraries have been loaded. I find this useful because some libraries are included with our code and some are loaded from CDN depending on setup.
function checkLoaded(name) {
if (!window[name]) {
throw new Error('Cannot find library ' + name);
}
}
['angular', 'd3', 'd3h'].forEach(checkLoaded);
Since we do not want to pollute global namespace, I wrap this check in a closure that executes itself
(function checkLoadedLibraries() {
function checkLoaded(name) {
if (!window[name]) {
throw new Error('Cannot find library ' + name);
}
}
['angular', 'd3', 'd3h'].forEach(checkLoaded);
}());
Is the function checkLoaded
itself correct, or should we use typeof window[name] === "undefined"
?
We cannot say for sure. This function is inside a closure, making it hard to unit test.
We could move it to separate file, but this will make it more complicated for such a simple task.
We can try testing it in place using
test-mole which reverses the testing logic - instead
of moving the function outside the closure to be reachable by unit testing framework, it
puts a test mole into the closure
(function checkLoadedLibraries() {
function checkLoaded(name) {
if (!window[name]) {
throw new Error('Cannot find library ' + name);
}
}
testMole.it('detects a function' {
window.foo = function foo() {};
checkLoaded('foo');
delete window.foo;
});
['angular', 'd3', 'd3h'].forEach(checkLoaded);
}());
Unfortunately this is still not working as a unit test, because it requires the entire environment
to run successfully! The function checkLoadedLibraries
that contains test mole will run the unit test
detects a function
but also runs through all global names when it executes the line
['angular', 'd3', 'd3h'].forEach(checkLoaded);
Additionally, different browsers might have bugs
and return different result depending on valueOf
being overwritten.
What we really need is to run the unit test detects a function
in production environment.
This might sound crazy, but consider unit tests running in production to be equivalent of car's
start up self-diagnostic tests and part of the monitoring. As long as the start time is not affected,
and the car does not slow down in the middle of the road to test itself, the user does not
even notice.
So I wrote lazy-test. It is a tiny test runtime that collects individual unit tests and then executes them one by one using event loop to schedule each unit test after previous one completes. By scheduling each test via event loop, lazy-test allows other production code to run, thus keeping performance impact to minimum. Also, you schedule the first unit test to start after N seconds, and put pauses between individual tests of M seconds.
The test engine only allows single unit test BDD call: it
. Otherwise, it has no built-in matchers,
I recommend using lazy assertions. I also assume there is a global
exception handler, or you can configure your own failed test reporter
(function checkLoadedLibraries() {
function checkLoaded(name) {
if (!window[name]) {
throw new Error('Cannot find library ' + name);
}
}
lazyTest.it('detects a function' {
window.foo = function foo() {};
checkLoaded('foo');
delete window.foo;
});
['angular', 'd3', 'd3h'].forEach(checkLoaded);
}());
// somewhere in the code start testing
lazyTest.options.reporters.fail = Raven.captureException;
// 10 seconds after the .start call
// each test will run at least 1 second after previous
lazyTest.start(10000, 1000);
We will send every failed test to Sentry, thus we will know if our assumptions are true in production environment.
We can also execute lazy-test during unit tests, making it equivalent to test-mole, see redirect to BDD.
Conclusion
I am taking baby steps with unit testing in production environment, and it remains to be seen if it is useful. So far it has allowed me to run unit tests on different platforms without using cross-browser testing solutions like SauceLabs. There are objections of course: increased code size, etc. All such objections could be mitigated with careful performance measurement and tuning.