Nov 22 2014

Measuring semver adherence

Measuring how close a library follows semver semantics when publishing new releases.

Semantic versioning replaces a single software version with a major.minor.patch triple.

major is incremented when there are API breaking changes
minor is incremented when a new feature is added without breaking existing API users
patch is a bug fix

If we think about a library's user, semver means

major - possible API break, we cannot upgrade without changing our code.
minor - there is new feature in the package, but my stuff should still work.
patch - there is a bug fix, the API stays the same, my code should work without changes.

There is an excellent introduction to semver that you should read.

This implicit contract is not enforceable - the library's author increments the numbers. Different developers approach the problem with different attitudes. For example, underscore does NOT follow semantic versioning, while its fork lodash does. There is a long argument between the two developer camps. In essense, I agree that

Following semver is a good thing.
It shows long range planning and caring about your users.

Unfortunately, great majority of public modules on NPM registry either do not follow semver, or still have major version 0, meaning API can be broken at any time.

If I am looking for a 3rd party library on NPM registry and I see several choices, a library that follows semver is more appealing to me.

Testing each version before updating

How can I tell if a library really follows semver and not just claims to do so? The only way I see is for every user of the library is to test if it works with different versions of the library in isolation. Run a project with version x.y.z, then install and run the project with x.y.z+1. If your project works without problems, then this particular patch increment follows semver principles. Perform the same over all patch and minor releases, and if nothing breaks - then library adheres to the semver closely.

I have already released a tool to make testing a project against different versions of any dependency a snap - next-update. It works for NPM and bower modules by installing each available version of every dependency and running your project's unit tests. For example, if my project declared the following dependencies

"dependencies": {
    "lodash": "~1.2.0",
    "async": "~0.2.5"
}

and lodash and async release new versions, next-update can test each release to see if it breaks your unit tests

next updates:
lodash
    1.2.1 PASS
async
    0.2.6 PASS
    0.2.7 PASS
    0.2.8 PASS

Finally, next-update tells you which dependencies can be installed safely

npm install --save [email protected] [email protected]

Example

Let us write a simple program that uses first public release of lodash@0.1.0

index.js

var _ = require('lodash');
var result = _.difference(['foo', 'bar'], ['foo']);
// ['bar']
console.assert(_.isEqual(result, ['bar']));

Our test command will be node index.js. We can check every available version of lodash against this program

next-update --latest false // check every version, not just latest
next updates:
lodash 0.2.0, 0.2.1, ..., 2.4.0, 2.4.1
Use the following command to install working versions
npm install --save [email protected]

So both _.difference and _.isEqual work for this particular test case across all lodash versions.

What if we write another small program using a feature available in lodash@0.1.0?

index.js

var _ = require('lodash');
var result = _.intersect(['foo', 'bar'], ['foo']);
// ['foo']
console.assert(_.isEqual(result, ['foo']));

We run the same next-update command but get a very different result

next-update --latest false
testing [email protected]
[email protected] works
testing [email protected]
npm test returned 1
test errors:
/index.js:2
var result = _.intersect(['foo', 'bar'], ['foo']);
TypeError: Object function lodash(value) {
    // allow invoking `lodash` without the `new` operator
    return new LoDash(value);
  } has no method 'intersect'

[email protected] removed _.intersect alias to _.intersection method, thus my second example can upgrade to lodash@0.2.0 but nothing after that. If we use the original method _.intersection then all versions of lodash work fine.

This small example shows that even when the library's author tries to follow semver very closely, small mistakes can crop up (removing an existing public method in a patch release is an API-breaking change).

Collecting test data across all users

While testing your project over and over against different dependencies, next-update sends anonymous statistics to my server application running on heroku. Only thedependency name, the from and to version and the test result are sent.

For example if your project already uses lodash@0.1.2 and next-update successfully tested lodash@1.2.1, then it will send

{
    "name": "lodash"
    "from": "1.2.0",
    "to": "1.2.1",
    "status": true
}

You project might use only a small set of features from lodash@1.2.0, but taken across multiple users, you can get a more complete data for lodash@1.2.0 vs lodash@1.2.1 (you can see this data at http://next-update.herokuapp.com/). If all or almost all projects dependent on the lodash successfully upgrade, then lodash follows semver at 1.2.1. Integrate this information across every minor and patch release and you can see how well it adheres to semver in general.

Computing semver adherence

If we take every update success status for a particular library together, we get an upper diagonal matrix, as shown below for check-types.js.

next update table

Each table row reflects the same starting version. Every column shows success when upgrading from the starting version to this particular version. The bottom triangle of cells is empty, because we do not test downgrading libraries.

Not every cell is filled, because no one has upgraded this particular combination of from / to versions. As more people use next-update, the table should fill up and show a more complete picture.

What if we take only the upgrade statistics for each immediately successful pair of versions? For example, if we consider only pairs 0.2.0 to 0.2.1, 0.2.1 to 0.2.2, 0.3.1 to 0.4.0, etc? The update success information about every pair of releases might not be available yet, so let us only consider the pairs with existing information. We get the following numbers, highlighted in red

next update pairs

In this case, let us look at the numbers plus the type of the change signalled by the semver triple

type      from     to      successful %
patch     0.2.0   0.2.1        100
patch     0.4.0   0.4.1        100
patch     0.6.0   0.6.1        100 
patch     0.6.4   0.6.5        100
minor     0.6.5   0.7.0        100
major     0.8.1   1.0.0          0
minor     1.0.0   1.1.0        100
patch     1.1.0   1.1.1         94
minor     1.1.1   1.2.0        100
patch     1.2.0   1.2.1        100
minor     1.2.1   1.3.0        100
patch     1.3.0   1.3.1        100
patch     1.3.1   1.3.2        100
minor     1.3.2   1.4.0        100

check-types.js adheres to semver principles very closely. It never broke its users when upgrading its minor or patch numbers, aside from 6% during patch upgrade from 1.1.0 to 1.1.1. All dependent projects were broken when the library released the new major version 1.0.0, but this is allowed by the semver principles.

We can take average number of successful pair updates from the above table, but I think this unnecessarily penalizes projects that have not stabilized until later versions. Instead I prefer to chart this information using highcharts to preserve the historical trend.

check-types semver

This shows how small the quality dip was in 1.1.1 patch release.

You can grab the original data for each library from next-update-stats API endpoint

http://next-update.herokuapp.com/package/<name>

Conclusion

Following the semver principles makes upgrading a library predictable. This project uses anonymous upgrade stats to compute a full historical picture of a module, helping other developers decide if the library is reliable.

dont-break

Update

A comment below brought to my attention the unfairness of punishing lodash for breaking its public API below 1.0.0 release. While I agree that this is one of the semantic version principles as listed at semver.org, I strongly disagree with this principle in practice. A lot of very important tools in NPM ecosystem never reach "stable" 1.0.0 release. Think grunt (the current stable release has version 0.4.5) or even Node itself (0.10.x). With this in mind, I think the boundary between the development period (< 1.0.0) and stable support (>= 1.0.0) is arbitrary. I would rather see every package start with version of API tagged 1.0.0 and reach 100.*.*!

Better world by better software

Gleb Bahmutov PhD

Our planet 🌏 is in danger

Act today: what you can do