Making small Docker image

How to make a very small Docker image with a Node application.

Docker is great, Node is awesome. But installing the entire world via npm install quickly becomes tired.

Here is an example: a tiny HTTP server that responds with the string hello world to anyone asking.

1
2
3
4
5
6
7
8
9
10
11
const http = require('http')
const message = 'Hello World from Node\n'
const server = http.createServer(function (request, response) {
console.log('responding with hello')
response.writeHead(200, { 'Content-Type': 'text/plain' })
response.end(message)
})
const port = process.env.PORT || 1337
server.listen(port)
console.log('Server running at port', port)
console.log(message)

You can find this code in bahmutov/hello-world repo. The server code cannot be simpler. It only uses built-in module http, it can run without installing any dependencies.

1
2
3
4
5
6
7
8
9
10
$ rm -rf node_modules/
$ npm start

> [email protected] start /Users/irinakous/git/hello-world
> node index.js

Server running at port 1337
Hello World from Node
$ curl localhost:1337
Hello World from Node

Great, but a solid development process requires linting, testing, pre-commit git hooks and lots of other steps in order for me to work faster and safer. So in the hello-world own package.json you will find a few development dependencies. There is linting with standard and code formatting with prettier-standard, there is unit testing with mocha and a few utilities like pre-git, git-issues, axios and start-server-and-test. In total I have 8 development dependencies. How bad can installing 8 dependencies be? I am using Node 6.8 with NPM 4.6.1

1
2
$ time npm install
real 0m53.399s

Ok, so I had to wait a little. How large is the node_modules folder?

1
2
$ du -h node_modules
171M node_modules/

Get out! 171 megabytes for 8 npm modules. If we were NOT installing these dependencies the deployment would be instant, and we could fit a million copies of the app on a regular hard drive.

The problem is more apparent when we build a Docker image for running this application. A simple Dockerfile would look like this

1
2
3
4
5
6
7
8
9
10
11
FROM mhart/alpine-node:6

WORKDIR /app
COPY . .

RUN npm install
RUN npm run lint
RUN npm run ci

EXPOSE 1337
CMD ["node", "index.js"]

I am starting this image from the (almost) smallest Node image mhart/alpine-node, but I still must install development dependencies to perform linting and unit testing before I can be sure the code works as expected.

1
2
3
4
5
6
7
$ docker build -t gleb/hello-world:simple -f Dockerfile-simple .
Sending build context to Docker daemon 14.85kB
Step 1/8 : FROM mhart/alpine-node:6
6: Pulling from mhart/alpine-node
...
Successfully built aed28e2bde45
Successfully tagged gleb/hello-world:simple

Each command in the Dockerfile creates a separate layer in the final image. We can see the size "contribution" of each command by looking at the size of the corresponding layer.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ docker history gleb/hello-world:simple
IMAGE CREATED CREATED BY SIZE
aed28e2bde45 15 seconds ago #(nop) CMD ["node" "index.js"] 0B
e79e1f4c23b9 16 seconds ago #(nop) EXPOSE 1337 0B
785237a55314 16 seconds ago npm run ci 0B
06f2ea0fd8fc 24 seconds ago npm run lint 319B
c668b02912e8 33 seconds ago npm install 152MB
4c2a151942b4 About a minute ago #(nop) COPY dir:18037759bbd287408… 5.08kB
9efcee73a9de About a minute ago #(nop) WORKDIR /app 0B
ab6c449798d3 10 days ago apk add --no-cache curl make gcc … 44.6MB
<missing> 10 days ago #(nop) ENV VERSION=v6.12.0 NPM_V… 0B
<missing> 2 weeks ago #(nop) CMD ["/bin/sh"] 0B
<missing> 2 weeks ago #(nop) ADD file:90342832e4e7931e4… 4.81MB

The <missing> are layers of the Docker image mhart/alpine-node:6 we based our simple Dockerfile on - we only know the final image hash ab6c449798d3. Above that we see the layer for each line of the Dockerfile, and the npm install is by far the largest and heaviest disk grabber clocking at 152MB. The total image size is above 200MB

1
2
3
$ docker images gleb/hello-world:simple
REPOSITORY TAG IMAGE ID CREATED SIZE
gleb/hello-world simple aed28e2bde45 7 minutes ago 202MB

How do we make the Docker image smaller? We could go through each NPM dependency and make sure it only includes what is really needed; see Smaller published NPM modules blog post how to measure and control the published module size. But there is a better way.

Multi-stage builds for the win

Docker v17 has introduced ability to build multiple Docker images in the same Dockerfile, and copy specific folder from one image into another one. The official docs give a pretty good introduction to this feature. Let us see how we can take advantage of multi-stage builds to avoid including development dependencies in the final output image while still doing testing.

We are going to have 3 named images inside same Dockerfile.

1
2
3
BASE ---> TEST (dev dependencies, runs tests)
\
----> PROD (expose port, run command)

The BASE image will have our source files, including package.json. We are going to install development dependencies and run tests inside TEST image derived from BASE. But we are going to ignore TEST (unless tests fail). Instead we are going to produce image named PROD that just adds exposed port and run command to the bare BASE image.

Here is the Dockefile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# base image with just our source files
FROM mhart/alpine-node:6 as BASE
WORKDIR /app
COPY . .

# test image installs development dependencies
# and runs testing commands
FROM BASE as TEST
RUN npm install
RUN npm run lint
RUN npm run ci

# final production image
FROM BASE as PROD
EXPOSE 1337
CMD ["node", "index.js"]

Building looks exactly like a regular build.

1
$ docker build -t gleb/hello-world:multi-stage .

We can confirm that the output image does NOT have disk destroying layers with node_modules folder

1
2
3
4
5
6
7
8
9
10
$ docker history gleb/hello-world:multi-stage
IMAGE CREATED CREATED BY SIZE
44dcd42b11f4 About a minute ago #(nop) CMD ["node" "index.js"] 0B
fcb01074c9e3 About a minute ago #(nop) EXPOSE 1337 0B
2d61e482fc59 2 minutes ago #(nop) COPY dir:dacd77af96552c3f3… 5.4kB
9efcee73a9de 16 minutes ago #(nop) WORKDIR /app 0B
ab6c449798d3 10 days ago apk add --no-cache curl make gcc … 44.6MB
<missing> 10 days ago #(nop) ENV VERSION=v6.12.0 NPM_V… 0B
<missing> 2 weeks ago #(nop) CMD ["/bin/sh"] 0B
<missing> 2 weeks ago #(nop) ADD file:90342832e4e7931e4… 4.81MB

The best way to see how much space we saved is by comparing the two images side by side

1
2
3
4
$ docker images gleb/hello-world
REPOSITORY TAG IMAGE ID CREATED SIZE
gleb/hello-world multi-stage 44dcd42b11f4 7 minutes ago 49.4MB
gleb/hello-world simple aed28e2bde45 21 minutes ago 202MB

Great, the new image is 1/4 size of the previous one.

But can we do better? Yes we can.

Bare Node image

When we are running our hello-world server, we are never going to execute npm commands. So why should we include npm in the Docker image? Luckily for us, mhart/alpine-node has Docker images with "base" Node, without any other tools.

So our Dockerfile has to be a little bit different. Our TEST image will be based on the same "full" Node image which includes NPM. Our BASE and PROD images are going to be based on the "bare" Node image with just the runtime, without NPM tool.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# "bare" base image with just our source files
# which only has Node runtime - not even NPM!
FROM mhart/alpine-node:base-6 as BASE
WORKDIR /app
COPY . .

# test image installs development dependencies
# and runs testing commands
# derived from Node image that _includes_ NPM
FROM mhart/alpine-node:6 as TEST
WORKDIR /app
# Copy files _from_ BASE
# To avoid accidentally creating different
# testing environment from production
COPY --from=BASE /app .
RUN npm install
RUN npm run lint
RUN npm run ci

# final production image
FROM BASE as PROD
EXPOSE 1337
CMD ["node", "index.js"]

Notice that after building BASE we copy source files into TEST using COPY --from=BASE syntax. We have to copy a folder because we no longer derive the TEST image from the BASE image. I prefer to copy the source folder from the BASE image rather than from the local current folder to avoid accidentally diverging images - we really must test the code we are going to run, and not build the app twice.

The build command is the same

1
$ docker build -t gleb/hello-world:bare -f Dockerfile-bare .

The tests pass, and the image is yet smaller again, by 11MB!

1
2
3
4
5
$ docker images gleb/hello-world
REPOSITORY TAG IMAGE ID CREATED SIZE
gleb/hello-world bare 78cdddcd77ac 23 seconds ago 37.9MB
gleb/hello-world multi-stage 44dcd42b11f4 17 minutes ago 49.4MB
gleb/hello-world simple aed28e2bde45 31 minutes ago 202MB

Final thoughts

We can shave off 80% of the Node application Docker image by NOT keeping the development tools after the tests pass. Of course in the "normal" application there would be production dependencies, which means the space savings are not going to be as impressive. But still, I expect that a significant chunk of the image are dev dependencies, passively taking up space.

When building application that does include production dependencies, you will need to install them, then copy the folder to the "bare" image, and the mhart/alpine-node README shows how to do this.

Keep shrinking!