Dec 12 2013

Paranoid coding

Checking input arguments before doing any computation helps to quickly debug problems.

Defensive coding: checking inputs before each computation, usually at the start of each function (preconditions). Sometimes checking result value before returning (postconditions).

Paranoid coding: checking inputs before each computation, as if the caller was evil and trying to break the function on purpose.

The software is a living organism. It is constantly evolving and shifting shape. Functions and modules are reorganized, reused or replaced. The ones surviving are far removed from the original purpose. The survivors are also interacting with environment that the programmer probably has not imagined.

In practical day to day coding I practice paranoid coding. My typical function verifies arguments, trying to check if everything is just right BEFORE doing anything meaningful. For example, a typical addition function:

function add(a, b) {
    check.verify.number(a, 'expected first argument to be a number, have', a);
    check.verify.number(b, 'expected second argument to be a number, have', b);
    return a + b;
}

When doing this, I prepare add to be reused reliably in other situations. If it is used on the webserver, it will NOT cause database corruption when user enters strings, or null, or something worse (like script to attach other users).

Here is another example, trying to replace alternative names to conventional one

function toInternalType(name) {
    check.verify.unemptyString(name, 'missing type name');
    var names = {
        'js': 'js',
        'javascript': 'js'
    };
    return names[name];
}

I use paranoid coding for several reasons and find it an extremely useful practice, especially as a documentation tool.

documentation

I do the checks because the developer using my function has not read the API docs (past intro example). Even if he has read the API doc, he has forgotten the details. I want the code to fail quickly and let the developer know exactly why it has failed. I have seen too many TypeError: Cannot read property 'length' of undefined when a function tried to read length of null array (maybe) to know better. Hunting for its location inside minified code is a pain. Even worse, by looking at the function, one cannot often say if the function is supposed to work with null or empty array as input.

Explicit argument and state checks also document my assumptions and code pretty well. I hate running the code just to find out what the assumed type of variable is. Is it a string selector or is it already wrapped in jQuery object? Can the function handle null or empty arrays? Can I run the init object method several times?

I prefer to use either descriptive macros (for C/C++) or well named functions (JavaScript), for example

1 2	check.verify.array(arg, 'expected an array'); check.verify.webUrl(url, 'expected an url, got ' + url);

There are manu assertion libraries for Nodejs, I prefer using Phil Booth's check-types.

Here is another example where I explicitly name the input coming over the wire untrusted to remember not to use it directly

update: function (req, res) {
    var untrusted = req.body;
    verify.object(untrusted, 'expected JSON update info object');
    // very exhaustive checking
    validate(untrusted);

    // do not use the untrusted object directly,
    // select properties to use instead
    var query = {
        name: untrusted.name,
        from: untrusted.from,
        to: untrusted.to
    };
    ...
}

performance

Doing checks like these might seem expensive. But it is still cheaper than developer's time spent debugging! Most of my checks are variable type, property presense, and other sanity checks that should be fast. I would avoid checking input / output (like file existance) or anything time expensive for no reason. If you suspect that assertions are expensive, please measure if they are really the bottleneck before removing.

really fail

It is important to check inputs before any processing to avoid partial state updates and data corruption. It is best to throw an exception in case a condition does not pass. If using Node.js or browser, remember the difference: console.assert throws an error on Nodejs, but inside a browser it prints the message and continues.

function foo() {
    console.assert(false, 'foo cannot continue');
    console.log('foo continues');
}
foo();

node

1
2
3

> foo();
AssertionError: foo cannot continue
    at Console.assert (console.js:102:23)

// Chrome
foo();
Assertion failed: foo cannot continue VM45:3
    foo VM45:3
    (anonymous function) VM45:6
foo continues

clear your name

Once your code protects its boundary, whenever a runtime exception happens, the blame is on the 3rd party user. Instead of digging into your code to discover the unsupported input format, the exception lets the user know they used your module incorrectly. This shifts the blame immediately away from you to the user.

better than static typing

A lot of people assume that the paranoid coding can be replaced with good static type system. Nothing is further from the truth. A barely useful type system that can check the conditions like below

1
2
3

la(check.unemptyString(names), ...);
la(check.not.email(loginName, ...);
la(check.maybe.string(name) || check.sha(name), ...);

will be extremely complicated and inflexible. Watch The Unreasonable Effectiveness of Dynamic Typing for Practical Programs by Robert Smallshire to get a sense how static typing is easily confused and starts producing meaningless results for anything but the simplest cases.

Imagine you have to connect to a REST JSON api. Will your type system understand that a user object has an email field? Or that it might have an email field? I don't think so.

Coming from C++ background and having to deal with static type checking via templates, I encourage you to try reading the Boost Type Traits library docs. This is a pretty good feature-rich library for specifying type information in C++ code. My brain hurts just from reading the docs. Now read a typical type checking library like check-types or is; these libraries just work in practice.

exception to the rule

I only skip checks for inner functions that are not exposed to the outside. Even in this case, the outer scope code has to fit onto a single page (50-100 lines of code maximum). Anything longer might get refactored for clarity, accidentally exposing the function to the outside callers.

Paranoid coding might seem like an overkill to most people. Other functions are not coded by evil hackers trying to break your code, wipe your data and wreak havoc on your system.

Or are they?

Better world by better software

Gleb Bahmutov PhD

Our planet 🌏 is in danger

Act today: what you can do