Sep 6 2014

Detecting function optimizations in V8

How to observe JavaScript v8 JIT events

Recently I have watched Unorthodox Performance presentation by lodash author John-David Dalton. Some claims were more unorthodox than others, but one stood out to me. At some level, it sounded too good to be true: refactoring into simpler smaller functions allows engine to optimize some of them, leading to better performance.

Let me restate the above statement (as I understood from the video) in more details:

When designing functions for performance, we must allow the JavaScript engine to optimize hot functions that execute often. In general, js engine can optimize a function that is small and does not modify its arguments object. This conflicts with designing a robust library with versatile API. The library must guard against invalid inputs, and support multiple execution paths. The engine will not optimize a typical library top level (public) function due to its size and complex argument handling.

Then the author suggests splitting top level public API functions that must deal with input arguments from inner private executor functions that can make assumptions about inputs. The JavaScript engine will optimize the executor functions, but will leave the public function unoptimized.

Does splitting functions and handling arguments at the entry point to the library lead to high(-er|-est) performance?

In this blog post I will show how to observe the optimizations as they are decided and executed by the most populate javascript engine: V8 that is at the heart of Chrome browser and Nodejs. Note of caution: I am using node v0.11 which has very recent version of v8. The stable node v0.10.30 has an older version of v8 (from 2012!), so these results might be different for other engines or versions of v8.

Just in time optimizations

Let us start with simple function that adds two arguments

1
2
3

function add(a, b) {
  return a + b;
}

We will execute this function 1 million times with integer arguments and will measure the total time

var k = 1e+6;
var started = new Date();
while (k--) {
  var result = add(2, 3);
  console.assert(result === 5);
}
var finished = new Date();
console.log('1m adds took', finished - started + 'ms');
// 1m adds took 312ms

We can display notifications when the v8 decided to mark a function for optimization and when it optimizes it. We just need to pass --trace_opt flag to node. I will also grep add to filter out updates for other functions

$ node --trace_opt add.js | grep add
1m adds took 297ms
[marking 0x3366180057a1 <JS Function add> for recompilation,
  reason: small function, ICs with typeinfo: 1/1 (100%)]
[optimizing 0x3366180057a1 <JS Function add> - took 0.004, 0.027, 0.007 ms]

Our function add in fact was optimized by the engine after it proved to be a good candidate: small and invoked with same argument types.

Showing function optimization status

We can even ask the v8 engine if a given function is optimized, never optimized, etc. We need to use command line option --allow-natives-syntax and special function %GetOptimizationStatus (I suggest you use v8-natives in anything but a simple example).

// same add code as above
function printStatus(fn) {
  var name = fn.name;
  switch (%GetOptimizationStatus(fn)) {
    case 1: console.log(fn.name, "function is optimized"); break;
    case 2: console.log(fn.name, "function is not optimized"); break;
    case 3: console.log(fn.name, "function is always optimized"); break;
    case 4: console.log(fn.name, "function is never optimized"); break;
    case 6: console.log(fn.name, "function is maybe deoptimized"); break;
  }
}
printStatus(add);
// $ node --allow-natives-syntax add.js
//=> add function is optimized

Let us now look at adding variable number of arguments and compare single function to factored out approach recommended by the lodash team

Using constant raw arguments

First, let us simply sum arguments without making new array

function sumArguments() {
  var n = arguments.length;
  var sum = 0;
  while (n--) {
    sum += arguments[n];
  }
  return sum;
}
var k = 1e+6;
while (k--) {
  sumArguments(1, 10, 100, 2, -2);
}
printStatus(sumArguments);

We will run and measure time using external time command

$ time node --allow-natives-syntax --trace_opt --trace_deopt index.js
[marking 0x7475c004149 <JS Function sumArguments> for recompilation,
  reason: small function, ICs with typeinfo: 5/5 (100%)]
[optimizing 0x7475c004149 <JS Function sumArguments> - took 0.055, 0.142, 0.037 ms]
sumArguments Function is optimized
real  0m0.069s
user  0m0.054s
sys 0m0.015s

Using arguments object without modifying it is fast and can be optimized.

Pass arguments into dedicated function

Let us now factor out the addition into small function. This is suggested by the video in order to validate arguments and select the appropriate function. Function sumArguments would be public API function, directing to appropriate inner function.

function sumArray(arr) {
  var n = arr.length;
  var sum = 0;
  while (n--) {
    sum += arr[n];
  }
  return sum;
}
function sumArguments() {
  return sumArray(arguments);
}
// same loop
printStatus(sumArray);
printStatus(sumArguments);

This version takes 5 times longer!

$ time node --allow-natives-syntax --trace_opt --trace_deopt index.js
[marking 0x410fa005751 <JS Function sumArguments> for recompilation,
  reason: small function, ICs with typeinfo: 0/0 (100%)]
[disabled optimization for 0x1b3acb869d71 <SharedFunctionInfo sumArguments>,
  reason: bad value context for arguments value]
[marking 0x57cb8926cb1 <JS Function sumArray (SharedFunctionInfo 0x1b3acb869ce1)> for recompilation,
  reason: small function, ICs with typeinfo: 5/5 (100%)]
[optimizing 0x57cb8926cb1 <JS Function sumArray> - took 0.016, 0.068, 0.016 ms]
sumArray Function is optimized
sumArguments Function is not optimized
real  0m0.377s
user  0m0.362s
sys 0m0.015s

This is a very weird result. Just passing arguments from top level (maybe public interface) function into small simple function gets a huge penalty hit. I think this is due to arguments reference being copied into arr value, and causing the outer function sumArguments to be NOT optimized.

Converting arguments to array

Let us convert arguments into array first, something we see in lodash a lot.

function sumArguments() {
  var arr = Array.prototype.slice.call(arguments, 0);
  return sumArray(arr);
}

I often code using the same conversion too. This leads even to worse performance

$ time node --allow-natives-syntax --trace_opt --trace_deopt index.js
sumArray Function is optimized
sumArguments Function is not optimized
real  0m0.540s
user  0m0.518s
sys 0m0.019s

Turns out, calling native Array.prototype.slice on arguments is an expensive operation!

Conclusion

Before doing any optimization, please measure first. Profiling might show surprising bottlenecks in unexpected places. In most cases, I prefer code that is clear and easy to understand to fast but brittle one. If you must create performant code, you might benefit from C-like macros that would expand in place (kind of like poor man's inlining)

function sumArguments() {
  var sum = ADD_ARGUMENTS
  // where ADD_ARGUMENTS would be a macro
}

Better world by better software

Gleb Bahmutov PhD

Our planet 🌏 is in danger

Act today: what you can do