While happily coding my ggit project I wrote the following function that returns a list of filenames that have been changed (according to Git version control).
1 | // src/changed-files.js |
The code is running a git command git diff --name-status --diff-filter=ACDM
that returns
a list of filenames and their modification letters. The output from the command is something like
this for a case if you modified source file src/foo.js
and added file README.md
M\tsrc/foo.js
A\tREADME.md
The output was a single object like this
1 | { |
This code essentially ran the following 4 steps
1: parse git stdout output
2: run the debug log command
3: group the files by modification
4: debug print the group
The final group is returned by the function. That is it. Yet this code is very error-prone.
We need to keep track of the local variables (files
), input arguments (data
), return variable (grouped
).
When there are 4 steps and 3 variables, they can all interact, causing the number of possible effects to go up to 12 (= 4 * 3). This is too complex for the human mind to keep track of; short term memory can only "cache" from 4 to 7 things at once.
Can we refactor this code to eliminate the variables? Yes! We are going to use functional composition to eliminate local variables and make the data flow stricter. We are also going to factor out individual pure functions that only work on the input arguments, making reasoning about them much simpler.
step 1
Factor every little data processing into its own function
We have 4 steps in this computation. Let us split it into 4 functions
1 | function parseLine(line) { |
Excellent, each small function is easy to reason about in isolation. Also, we could move them into a separate file and quickly unit test. The main block of code we are going to refactor is now very clear. I like giving functions names; makes debugging crashes much simpler
1 | function outputToGroup(data) { |
step 2
Replace imperative steps with composition
The output of each function call in outputToGroup
is fed into the next function (except for log messages).
Imagine for a second that functions logFoundFiles
and logGroupedFiles
returned whatever the first argument
passed to them. Then the function outputToGroup
could be written like this
1 | function outputToGroup(data) { |
We don't even need an actual function outputToGroup
- we could make this "caterpillar" on the fly
using an utility from any functional library: lodash.compose, or
ramda.compose.
1 | var R = require('ramda'); |
That is very cool and literally eliminates any place in the code for an error to hide.
We only have a single problem: functions logGroupedFiles
and logFoundFiles
do NOT return the
first argument, thus we cannot use them inside compose: they stop the flow of data!
Luckily, it is simple to work around this problem. We can adapt the function on the fly
by creating a new function that DOES return its arguments, but calls the original function.
It is called tap
and one can either implement tap
or use a 3rd party utility, like
ramda.tap
1 | logFoundFiles('foo'); // prints "foo", returns undefined |
We just need to add taps around any function in our composition that should be "ignored", and whose original arguments should be just passed to the next step
1 | var R = require('ramda'); |
I indented the code a little bit differently for clarity. Even better in my personal opinion
to use ramda.pipe which is equivalent to R.compose
except the functions are in
reverse order. To me the R.pipe
goes from left to right, which is very natural because the
code it replaces (imperative) goes from top to bottom.
1 | // original imperative code from top to bottom |
The functional code is in the same order as original, except we literally listed what
functions to run in the same order, but the flow of data is managed for us. The output
of parseOutput
will be fed to both logFoundFiles
and groupByModification
. The output
of logFoundFiles
is ignored via R.tap
. The output from groupByModification
which runs
after logFoundFiles
finishes will be fed to logGroupedFiles
. The output of
logGroupedFiles
is ignored due to R.tap
. The pipe will instead return the output
of groupByModification
call.
Bonus - unit testing outputToGroup
We have refactored individual steps into separate functions, and composed the final logic into
a function stored in a private variable var outputToGroup = ...
. Can we unit test the individual
functions or the outputToGroup
function? Usually one needs to export a function from a module
to be able to unit test it. Too much trouble, we can do better. Using describe-it
we can get access to pretty much anything inside a module without exporting it.
1 | function parseOutput(stdout) { ... } |
1 | var describeIt = require('describe-it'); |
Pretty cool - we got access to parseOutput
in our unit tests without modifying any code
inside the changed-files.js
source file. To see the complete unit test, including tests
that access the outputToGroup
function, take a look at the changed-files-spec.js.
Additional reading
- My favorite functional adapters - there is more than
tap
- Journey from procedural to functional reactive with stops.
- If you ever used promises (like in the above example), you have used functional programming, because Promises are functors.