Test Runners

Interesting ideas in JavaScript test runners.

Even old things can be improved. Consider a wood axe. Simple, right? Humans have been chopping trees for thousands of years, yet there was undiscovered improvement. This guy has improved the axe to use all that angular momentum to chop wood easier and quicker. Notice the very unusual shape of the blade - it is mean to cut and then rotate to push the sides of the split apart.

What does this teach us? There is room to improve everything. JavaScript test runners especially have a lot of room to grow! Consider a simple runner like QUnit or Mocha. Each runner does two things: collects all user's unit functions, then executes them. Yet there is so much more we can do to make test writing and running more pleasant and productive.

I will take a look at 5 test runners and the new features they bring: Ava, Jest, Rocha, Focha and Locha. Ava and Jest are well known (and I love them both), and the later 3 are my own wrappers around my go-to test runner Mocha. Each test runner has something interesting to offer, and I hope that through cross-pollination of ideas the testing experience in JavaScript would improve.

Ava

Ava came onto the scene suddenly and with a splash. It introduced ES6 code transpiling by default, allowing everyone to unit test modern JavaScript code. It also introduced a nice feature that was unavailable in other test runners (as far as I know).

Each spec file is running in its own Ava test runner instance.

This means, whatever one spec file is doing is unaffecting the other spec files. There is no shared memory, or module system - every test runner was spawn as a separate process, isolating the tests from one file from tests in another.

The isolation helps with finding inter dependencies among tests, and it also allows to run tests in parallel (which really means "faster"). And it almost happened ;) But the requirement to transpile everything in each subprocess during Node 0.12 and 4 days meant the parallel speed advantages were kind of moot for small "trial" projects.

Luckily, today with Node 6/8 the transpile is almost never necessary, and Ava test runs are super fast.

Jest

Snapshot testing

Jest test runner has introduced a bunch of features that I love. In particular I am awed by its snapshot testing feature. No longer I have to write many assertions to compare the entire result with expected value; I don't even have to compute the expected value.

Instead I just need to say that the computation should match the snapshot.

For larger things like DOM component rendering creating the expected value by hand is almost impossible!

1
2
3
4
5
6
7
8
9
import React from 'react';
import Link from '../Link.react';
import renderer from 'react-test-renderer';
it('renders correctly', () => {
const tree = renderer.create(
<Link page="http://www.facebook.com">Facebook</Link>
).toJSON();
expect(tree).toMatchSnapshot();
});

The snapshot assertion expect(tree).toMatchSnapshot() will try to load previous value from a snapshot file. If Jest cannot find the snapshot file, that means it has never ran before. Jest will save whatever the computed tree object is, and you should commit the snapshot file in the code repository, just like a regular test fixture file. It is a plain JavaScript file after all.

1
2
3
4
5
6
7
8
9
10
11
// snapshot file
exports[`renders correctly 1`] = `
<a
className="normal"
href="http://www.facebook.com"
onMouseEnter={[Function]}
onMouseLeave={[Function]}
>
Facebook
</a>
`

Next time it runs locally or on CI, if the tree has been rendered differently, Jest can show you a beautiful error message

Jest snapshot mismatch

I loved snapshot testing so much, I really wanted it inside my Mocha tests. While a test runner like Ava just grabbed the Jest snapshot module (see what I mean about test ecosystem tools borrowing ideas from each other?), I had a strong "Not Invented Here" syndrome. So I wrote my own snap-shot library that can work with pretty much any test framework as a zero-configuration add-on.

1
2
3
4
5
const snapshot = require('snap-shot')
// Mocha
it('is 42', () => {
snapshot(42)
})

The snap-shot works without integration with the test runner, and thus it had to overcome a major problem. When a test calls snapshot(value), how do you know the test file and the test name so you can look up the previously saved snapshot? snap-shot works by inspecting the stack trace when it is called and then by finding the spec file, and then by inspecting its AST to find the it(name, cb) statement. You can find details in this blog post and in these slides.

This works 90% of the time, but has problems finding the right test in heavily transpiled JavaScript code or other languages like CoffeeScript and TypeScript. I spent some time trying to solve this problem, but then have decided to limit myself to BDD frameworks (like Jest, Mocha, etc). These test runners have a couple of standard methods available to test code, like beforeEach and afterEach

1
2
3
4
5
6
7
8
9
10
11
12
beforeEach(() => {
console.log('runs before each test')
})
it('works', () => {
console.log('works')
})
it('does not', () => {
throw new Error('oops')
})
afterEach(() => {
console.log('runs after each test')
})

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
runs before each test
works
✓ works
runs after each test
runs before each test
1) does not
runs after each test

1 passing (14ms)
1 failing

1) does not:
Error: oops
at Context.it (spec.js:8:9)

By relying on the global functions like beforeEach, I could write a snapshot utility that actually works in any language - because it would find its "owner" test during runtime and not by static source inspection. So I made snap-shot-it - it registers beforeEach callback to grab the current test about to be executed. If the test calls snapshot then inside snap-shot-it it can find the test's name, spec file, etc without any hunting.

Beautiful, but why spend so much time writing a utility that already exists? Because I want simple 1 page module that is not relying on a particular framework. I also want to learn and experiment, and snap-shot, snap-shot-it produces another cool collection of tools. By factoring out saving, loading and comparing snapshot values into snap-shot-core I have been able to implement additional features.

Have a data you want to snapshot, but the actual values change? Only the shape of the data stays the same? Example: top selling item returned by the API - the name and SKU numbers change, but the object must have name and SKU. No problem - schema-shot to the rescue. Have a list and it keeps growing, so the snapshot should only check a subset? No problem - subset-shot has you covered. Have a function that produces a lot of data and want to use that as a snapshot? Perfect opportunity to use data-driven snapshot

1
2
3
4
5
// checks if n is prime
const isPrime = n => ...
it('tests prime', () => {
snapshot(isPrime, 1, 2, 3, 4, 5, 6, 7, 8, 9)
})

Produces snapshot that has

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
exports['tests prime 1'] = {
"name": "isPrime",
"behavior": [
{
"given": 1,
"expect": false
},
{
"given": 2,
"expect": true
},
{
"given": 3,
"expect": true
},
{
"given": 4,
"expect": false
},
{
"given": 5,
"expect": true
},
...
]
}

In summary, snapshot testing is really useful, and now there is a variety of snapshot choices; I described the alternatives.

Code coverage for faster testing

In addition, Jest has another cool feature. It collects code coverage by default (yey zero config!) and thus is able to track which test files cover which source files. When a test or a source file changes, Jest can rerun the affected test files which means super fast feedback loop.

Jest can run tests for changed files by using collected code coverage

Full proud disclosure - I wrote untested 5 years ago (Jan 2013). untested orders your unit tests by code coverage so you can test faster, it even supports browser based tests through lasso. It is kind of cool to see ideas that were prototypes for a long time used in production to test millions of files.

update as this twitter thread notes I was mistaken thinking Jest uses code coverage to track test dependencies. Instead Jest uses file to file dependencies. If the test file "a-spec.js" loads "a.js" then when file "a.js" changes, test file "a-spec.js" will rerun all its tests. On the other hand, a test runner like Wallaby.js actually does track code coverage for each test and can accurately rerun only the affected individual tests.

Rocha

Each unit test should be independent from the other unit tests. Easy to say, right? But it is so easy for one test to leave changed global state, affecting the result of another test. In this file, one of the tests changes value foo, making the third test pass.

1
2
3
4
5
6
7
8
9
10
11
describe('example', function () {
var foo
it('runs test 1', function () {
foo = 42
console.log('polluted the environment')
})
it('runs test 2', function () {})
it('runs test 3', function () {
console.assert(foo === 42, 'foo is 42', foo)
})
})

Yet if we run the third test by itself, it stops working, because nothing set foo = 42 before it runs. The flaky tests are hard to debug, because isolating the tests literally breaks it or removes the source of the flake.

This is why I wrote rocha - a "random" Mocha test runner. Before running the tests, Rocha randomly changes the order of unit tests, hopefully breaking the "happy test order", and instead flushing out inter-test dependencies. The above tests show the difference between Mocha and Rocha

Running tests using Mocha

1
2
3
4
5
6
7
> mocha spec/tricky-spec.js
example
polluted the environment
✓ runs test 1
✓ runs test 2
✓ runs test 3
3 passing (8ms)

Rocha shuffles your tests to flush out inter-test dependencies

Running tests using Rocha

1
2
3
4
5
6
7
8
9
10
11
> rocha spec/tricky-spec.js
shuffling 3 unit tests in "example"
example
1) runs test 3
polluted the environment
✓ runs test 1
✓ runs test 2
2 passing (10ms)
1 failing
1) example runs test 3:
AssertionError: foo is 42 undefined

Perfect, we caught the flaky test. Maybe not right away, maybe after a few runs, when each run used a different reshufle. But what happens when we try to investigate the problem - will it disappear because the tests will be shuffled again? No. When tests fail, Rocha will save the failing test order, and on next run will set it to be the same again. A developer can rerun the "bad" test order until the problem is discovered and fixed.

Focha

Imagine you have 100 of tests. If each test runs for 10 seconds that's 1000 seconds which is approximately 15 minutes. That's a long time to wait to find out if all tests are passing. What usually happens is:

  • a few tests break on CI
  • you push a fix

and now you wait for CI to finish running through the 100 tests just to find out if test #66 that was failing before starts passing again. Wouldn't it be more useful to find previously failing tests first?

Focha runs tests that previously failed first so you find out if you have fixed them sooner

Similarly to rocha, Focha is a wrapper around Mocha that concentrates on collecting failing tests (the "F" in "Focha"). When all tests finish, Focha saves failing tests (if any have failed) in a JSON file or sends it to a REST api endpoint.

Next time Focha runs, it loads and runs just the failing tests. Thus you find out if the test #66 has been fixed in 10 seconds rather than in 15 minutes. If the previously failing tests pass, then you call focha --all to run all tests

1
2
3
4
5
{
{
"scripts": "focha *-spec.js && focha --all *-spec.js"
}
}

Useful!

Locha

Finally, there is Locha - the "L"oud Mocha. Imagine a test that exercises a complex piece of code. That code probably has a lot of logging statements. I love using debug module so I can enable log messages through an environment variable.

1
DEBUG=my-module npm test

Being able to easily turn on verbose logging leads to a dilemma - do you enable all logging in CI by default just in case a test fails? That's not good - each test can generate 10 - 100 - 1000 log messages! In our testing at Cypress the CI test log output was overwhelming CircleCI and TravisCI UI, and we could only download the raw text file if we wanted to see it! But if we disabled the log messages, when a test failed we had absolutely no idea what went wrong, which is also not good.

Locha gives you a happy compromise. It runs all your tests with minimal default logging, but if any test fails, it reruns the failing tests only with extra environment variables. Take a look at this test file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
const debug = require('debug')('failing')
describe('failing spec', () => {
it('A', function () {
console.log(`in "${this.test.fullTitle()}"`)

debug('a lot of')
debug('verbose')
debug('messages')
debug('in debug')
debug('mode in test A')
})

it('B', function () {
console.log(`in "${this.test.fullTitle()}"`)

debug('a lot of')
debug('verbose')
debug('messages')
debug('in debug')
debug('mode in test B')
throw new Error('B fails')
})
})

It has a log of debug statements, but they will only output messages to the console if we run the tests with DEBUG=failing npm test command. By default, the tests are pretty quiet. One of the tests is failing. Here is the output from the Locha test runner

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ npm run demo
> $(npm bin)/locha test.js --env DEBUG:failing

failing spec
in "failing spec A"
✓ A
in "failing spec B"
1) B

1 passing (14ms)
1 failing

1) failing spec B:
Error: B fails
at Context.<anonymous> (test.js:21:11)

mocha finished with 1 failure
Failed first time, rerunning 1 test

failing spec
in "failing spec B"
failing a lot of +0ms
failing verbose +1ms
failing messages +1ms
failing in debug +0ms
failing mode in test B +0ms
1) B

0 passing (5ms)
1 failing

1) failing spec B:
Error: B fails
at Context.<anonymous> (test.js:21:11)
mocha finished with 1 failure

Do you see two test runs? First one executed two unit tests, and only the console.log statements were visible. During the first run, test "B" failed, and Locha has executed just this test in the second round. During this round Locha has added the environment variables we have passed as a CLI flag --env DEBUG:failing to the mix. Thus the second round is pretty "loud", and allows us to debug the failure, or at least get an idea why it happens.

Locha keeps passing tests' output to a minimum and makes failing tests very verbose

Final thoughts

Making a useful testing tool is tricky, but there is definitely room for improvements. The entire testing and quality assurance process in JavaScript is still a chore, and a hindrance. We must do better, and have more useful information from our tests faster to avoid introducing bugs into the code.