Recently I have watched Unorthodox Performance presentation by lodash author John-David Dalton. Some claims were more unorthodox than others, but one stood out to me. At some level, it sounded too good to be true: refactoring into simpler smaller functions allows engine to optimize some of them, leading to better performance.
Let me restate the above statement (as I understood from the video) in more details:
When designing functions for performance, we must allow the JavaScript engine to optimize hot functions that execute often. In general, js engine can optimize a function that is small and does not modify its arguments object. This conflicts with designing a robust library with versatile API. The library must guard against invalid inputs, and support multiple execution paths. The engine will not optimize a typical library top level (public) function due to its size and complex argument handling.
Then the author suggests splitting top level public API functions that must deal with input arguments from inner private executor functions that can make assumptions about inputs. The JavaScript engine will optimize the executor functions, but will leave the public function unoptimized.
Does splitting functions and handling arguments at the entry point to the library lead to high(-er|-est) performance?
In this blog post I will show how to observe the optimizations as they are decided and executed by the most populate javascript engine: V8 that is at the heart of Chrome browser and Nodejs. Note of caution: I am using node v0.11 which has very recent version of v8. The stable node v0.10.30 has an older version of v8 (from 2012!), so these results might be different for other engines or versions of v8.
Just in time optimizations
Let us start with simple function that adds two arguments
1 | function add(a, b) { |
We will execute this function 1 million times with integer arguments and will measure the total time
1 | var k = 1e+6; |
We can display notifications when the v8 decided to mark a function for optimization and
when it optimizes it. We just need to pass --trace_opt
flag to node. I will also grep add
to filter out updates for other functions
$ node --trace_opt add.js | grep add
1m adds took 297ms
[marking 0x3366180057a1 <JS Function add> for recompilation,
reason: small function, ICs with typeinfo: 1/1 (100%)]
[optimizing 0x3366180057a1 <JS Function add> - took 0.004, 0.027, 0.007 ms]
Our function add
in fact was optimized by the engine after it proved to be a good candidate:
small and invoked with same argument types.
Showing function optimization status
We can even ask the v8 engine if a given function is optimized, never optimized, etc.
We need to use command line option --allow-natives-syntax
and special function
%GetOptimizationStatus
(I suggest you use v8-natives
in anything but a simple example).
1 | // same add code as above |
Let us now look at adding variable number of arguments and compare single function to factored out approach recommended by the lodash team
Using constant raw arguments
First, let us simply sum arguments without making new array
1 | function sumArguments() { |
We will run and measure time using external time
command
$ time node --allow-natives-syntax --trace_opt --trace_deopt index.js
[marking 0x7475c004149 <JS Function sumArguments> for recompilation,
reason: small function, ICs with typeinfo: 5/5 (100%)]
[optimizing 0x7475c004149 <JS Function sumArguments> - took 0.055, 0.142, 0.037 ms]
sumArguments Function is optimized
real 0m0.069s
user 0m0.054s
sys 0m0.015s
Using arguments
object without modifying it is fast and can be optimized.
Pass arguments into dedicated function
Let us now factor out the addition into small function. This is suggested by the video
in order to validate arguments and select the appropriate function. Function sumArguments
would be public API function, directing to appropriate inner function.
1 | function sumArray(arr) { |
This version takes 5 times longer!
$ time node --allow-natives-syntax --trace_opt --trace_deopt index.js
[marking 0x410fa005751 <JS Function sumArguments> for recompilation,
reason: small function, ICs with typeinfo: 0/0 (100%)]
[disabled optimization for 0x1b3acb869d71 <SharedFunctionInfo sumArguments>,
reason: bad value context for arguments value]
[marking 0x57cb8926cb1 <JS Function sumArray (SharedFunctionInfo 0x1b3acb869ce1)> for recompilation,
reason: small function, ICs with typeinfo: 5/5 (100%)]
[optimizing 0x57cb8926cb1 <JS Function sumArray> - took 0.016, 0.068, 0.016 ms]
sumArray Function is optimized
sumArguments Function is not optimized
real 0m0.377s
user 0m0.362s
sys 0m0.015s
This is a very weird result. Just passing arguments
from top level (maybe public interface) function
into small simple function gets a huge penalty hit. I think this is due to arguments reference being
copied into arr
value, and causing the outer function sumArguments
to be NOT optimized.
Converting arguments to array
Let us convert arguments
into array first, something we see in lodash a lot.
1 | function sumArguments() { |
I often code using the same conversion too. This leads even to worse performance
$ time node --allow-natives-syntax --trace_opt --trace_deopt index.js
sumArray Function is optimized
sumArguments Function is not optimized
real 0m0.540s
user 0m0.518s
sys 0m0.019s
Turns out, calling native Array.prototype.slice
on arguments
is an expensive operation!
Conclusion
Before doing any optimization, please measure first. Profiling might show surprising bottlenecks in unexpected places. In most cases, I prefer code that is clear and easy to understand to fast but brittle one. If you must create performant code, you might benefit from C-like macros that would expand in place (kind of like poor man's inlining)
1 | function sumArguments() { |
Related: Optimization killers, Performance Tips for JavaScript in V8