Agile software quality

A small team can produce high quality software without exhaustive and long testing cycles.

This is an essay version of the presentation I have delivered at the Automated Testing Boston meetup. You can see the original slides and bonus material.

Imagine you are a small company, trying to capture a piece of the market. You have to compete with much larger established companies. They have everything: finished products, paying customers, future goals.

You have only some software, maybe a few customers and the world domination plan. How do you compete? How do you have enough resources to deliver a product that always works? You probably have a single chance to impress a customer. Anything that does not work damages the relationship since there is no proven track record to fall on.

I worked at large, medium and small companies. Working at a startup is definitely the most exciting experience one can have. The appeal to the engineer is clear: one can pick the latest tools, not worry about the legacy system support and design a system that solves the exact problem.

For the customer, a small company can deliver a solution that solves the problem. A large supplier will never customize the product to match your needs. The startup's success thus depends on finding quickly what precisely solves the customer's problem. One cannot afford long delivery cycles, yet the software must work. How does one deliver software that works yet is flexible and can be changed quickly?

We are talking about agile quality - the software that works, yet can be modified quickly. There are different paths to software quality. Two traditional techniques to achieve high quality are:

  1. longer planning and design phases in the waterfall model
  2. extensive and even exhaustive testing during quality phase

Both techniques do not work in agile environment, because you are investing in designing and testing a feature the customer might not need. One can design and build a beautiful walkpath only to discover that the customer takes a shortcut.

Desired paths fullscreen

At Kensho we approached the problem differently. First, we have noticed that every step to higher quality can be categorized into 3 categories: products, process and people. Picking a different testing framework is considered an improvement in the product. Following test-driven development might increase the output quality and is considered a process improvement. Finally, training every software engineer to write better testing code is an example of improving people.

It is simple to introduce a new product, it is much harder to introduce a new process into the software development life cycle. Finally, changing people is the hardest approach one can take, yet the payoff is the largest. Next I will give an example in each category that allowed us to increase the output quality.

Product

Every piece of software has bugs. Users typically do not report every crash or unexpected behavior. Our first quality improvement was to use a client-side crash reporting library Sentry. Instead of trying to predict user's behavior and exhaustively test, we started getting real-time error reports. This has shown us several hard to reproduce bugs we could never catch in testing, unless we invested months into writing testing code. Yet, the users found these bugs and we quickly fixed them.

You can read more about using Sentry (or any client-side error reporting tool) in these blog posts.

Process

If one relies on manual testing or production error reporting, one starts getting error reports. Typical amount of information described by the user / tester is only the surface visible to the user. Some bugs are transient and happen very infrequently, and are impossible to recreate.

In order to quickly diagnose the root cause of an error, we started using assertions in production. Typically, assertions are used during local development and are turned off in production. We have decided to turn the process upside down: the production code has lots of assertions, an approach we call paranoid coding. The larger the distance between two pieces of code that call each other - the more assertions we use, technique we called defensive distance.

Whenever we crash early in production when an assertion fails, we get an error report via Sentry. We would like to get all relevant information right there in the report, because it might be hard to recreate the bug again. Thus we wrote a JavaScript library of lazy assertions. These assertions allow passing as many arguments as needed to fully show the environment at the moment of the crash, but without paying the performance penalty every time an assertion runs.

Using lazy assertions became our second nature. We even started using them in our testing code instead of the traditional matchers provided by the unit testing framework, see Testing without matchers blog post.

People

Finally, we raised the final software quality by improving the communication among the engineers. Whenever a new person joins an existing team to work on a project, he is required to read a list of resources for that specific project. Typically, it is one or two programming books and maybe a few tutorials relevant to the project. We list these resources in the project's README file and update when needed. Having the same books under our belts allows us to write similar code, use the same patterns and understand each other's work faster and with fewer errors.