Docker is great, Node is awesome. But installing the entire world via npm install
quickly
becomes tired.
Here is an example: a tiny HTTP server that responds with the string hello world
to anyone asking.
1 | const http = require('http') |
You can find this code in bahmutov/hello-world repo.
The server code cannot be simpler. It only uses built-in module http
, it can run without
installing any dependencies.
1 | $ rm -rf node_modules/ |
Great, but a solid development process requires linting, testing, pre-commit git hooks
and lots of other steps in order for me to work
faster and safer. So in the hello-world
own
package.json you will find
a few development dependencies. There is linting with standard
and code formatting
with prettier-standard
, there is unit testing with mocha
and a few utilities like
pre-git
, git-issues
, axios
and start-server-and-test
. In total I have 8 development
dependencies. How bad can installing 8 dependencies be? I am using Node 6.8 with NPM 4.6.1
1 | $ time npm install |
Ok, so I had to wait a little. How large is the node_modules
folder?
1 | $ du -h node_modules |
Get out! 171 megabytes for 8 npm modules. If we were NOT installing these dependencies the deployment would be instant, and we could fit a million copies of the app on a regular hard drive.
The problem is more apparent when we build a Docker image for running this application. A simple Dockerfile would look like this
1 | FROM mhart/alpine-node:6 |
I am starting this image from the (almost) smallest Node image mhart/alpine-node, but I still must install development dependencies to perform linting and unit testing before I can be sure the code works as expected.
1 | $ docker build -t gleb/hello-world:simple -f Dockerfile-simple . |
Each command in the Dockerfile creates a separate layer in the final image. We can see the size "contribution" of each command by looking at the size of the corresponding layer.
1 | $ docker history gleb/hello-world:simple |
The <missing>
are layers of the Docker image mhart/alpine-node:6
we based our simple
Dockerfile on - we only know the final image hash ab6c449798d3
. Above that we see the layer
for each line of the Dockerfile, and the npm install
is by far the largest and heaviest
disk grabber clocking at 152MB. The total image size is above 200MB
1 | $ docker images gleb/hello-world:simple |
How do we make the Docker image smaller? We could go through each NPM dependency and make sure it only includes what is really needed; see Smaller published NPM modules blog post how to measure and control the published module size. But there is a better way.
Multi-stage builds for the win
Docker v17 has introduced ability to build multiple Docker images in the same Dockerfile, and copy specific folder from one image into another one. The official docs give a pretty good introduction to this feature. Let us see how we can take advantage of multi-stage builds to avoid including development dependencies in the final output image while still doing testing.
We are going to have 3 named images inside same Dockerfile.
1 | BASE ---> TEST (dev dependencies, runs tests) |
The BASE
image will have our source files, including package.json
. We are going to
install development dependencies and run tests inside TEST
image derived from BASE
. But we
are going to ignore TEST
(unless tests fail). Instead we are going to produce image
named PROD
that just adds exposed port and run command to the bare BASE
image.
Here is the Dockefile
1 | # base image with just our source files |
Building looks exactly like a regular build.
1 | $ docker build -t gleb/hello-world:multi-stage . |
We can confirm that the output image does NOT have disk destroying layers with node_modules
folder
1 | $ docker history gleb/hello-world:multi-stage |
The best way to see how much space we saved is by comparing the two images side by side
1 | $ docker images gleb/hello-world |
Great, the new image is 1/4 size of the previous one.
But can we do better? Yes we can.
Bare Node image
When we are running our hello-world
server, we are never going to execute npm
commands.
So why should we include npm
in the Docker image? Luckily for us, mhart/alpine-node
has
Docker images with "base" Node, without any other tools.
So our Dockerfile has to be a little bit different. Our TEST
image will be based on the
same "full" Node image which includes NPM. Our BASE
and PROD
images are going to be based
on the "bare" Node image with just the runtime, without NPM tool.
1 | # "bare" base image with just our source files |
Notice that after building BASE
we copy source files into TEST
using COPY --from=BASE
syntax. We have to copy a folder because we no longer derive the TEST
image from the
BASE
image. I prefer to copy the source folder from the BASE
image rather than from the
local current folder to avoid accidentally diverging images - we really must test the code we
are going to run, and not build the app twice.
The build command is the same
1 | $ docker build -t gleb/hello-world:bare -f Dockerfile-bare . |
The tests pass, and the image is yet smaller again, by 11MB!
1 | $ docker images gleb/hello-world |
Final thoughts
We can shave off 80% of the Node application Docker image by NOT keeping the development tools after the tests pass. Of course in the "normal" application there would be production dependencies, which means the space savings are not going to be as impressive. But still, I expect that a significant chunk of the image are dev dependencies, passively taking up space.
When building application that does include production dependencies, you will need to install them, then copy the folder to the "bare" image, and the mhart/alpine-node README shows how to do this.
Keep shrinking!