Oct 31 2020

Do Not Let NPM Cache Snowball on CI

Do not use lax restore cache keys or your cache will roll over with unused dependencies

TLDR: when restoring NPM cache on your continuous integration service use the exact lock file hash, do not use lax partial restore cache keys.

The NPM caching on CI

Imagine we have a Node project that we test on continuous integration server. Take bahmutov/snowball-npm-cache-example repository for example. It has only a single production dependency - my favorite debug module; I use it to log all the things the right way.

1 2	$ npm i -S debug + [email protected]

Let's say we want to test our project on continuous integration service, like GitHub Actions. We need to check out the source code and install dependencies before we can run any tests. Here is our initial CI workflow file (copied almost verbatim from Example CI configs).

.github/workflows/ci.yml

name: ci
on: push
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout 🛎
        uses: actions/checkout@v2

      - name: Cache node_modules
        uses: actions/cache@v2
        env:
          cache-name: cache-node-modules
        with:
          path: '~/.npm'
          key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-build-${{ env.cache-name }}-
            ${{ runner.os }}-build-
            ${{ runner.os }}-
      - name: Install
        run: npm ci

We are using actions/cache official action, and the above syntax comes straight from the page Caching dependencies to speed up workflows documentation. Let me explain it.

After the code is checked out, the actions/cache step takes over. It uses the name cache-node-modules that we have picked. While not necessary right now, it will become handy later.

The action will cache the folder ~/.npm. This is the folder where NPM caches downloaded NPM modules. We can see this folder's name locally by asking NPM

$ npm config get cache
/Users/gleb/.npm
$ echo ~
/Users/gleb

The action first tries to restore this folder ~/.npm on CI by looking up caches stored for this project. Every cache has a key - a name for the cache. This is where the key and the restore_keys come into play. First, the action computes them. The key is the most precise cache name, it uses the OS name, the cache-name string and the hash of the lock file.

1	key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}

Every time we install a dependency locally, the package-lock.json is recreated. Thus the hash of this file changes every time the package lock file changes.

The restore-keys are fallbacks. If the cache with the exact key is not found, the actions/cache tries to find a cache where the name starts with the given key. Maybe there is already a cache that starts with ${{ runner.os }}-build-${{ env.cache-name }}-? This would be the case if there was a previously saved cache with different package-lock.json file. If not found, then actions/cache tries to find a saved cache that starts with ${{ runner.os }}-build- - so even the specified cache-name does not matter. Finally, if nothing has been found, the actions/cache tries to restore ${{ runner.os }}-build- prefix. If that fails, it tries to restore cache with ${{ runner.os }}- prefix. At this point you might be asking yourself "wait, it will restore some random cache and go from there?"

Yes. The restore keys are very lax, and a pretty random cache can be restored, giving you some random ~/.npm folder before npm ci runs. That is not a problem. Yet. Let's see the GitHub Action messages the first time our project runs.

Actions cache messages printed during the very first CI run

There were no previously saved caches (this was the very first CI run for this repository). Thus there was nothing to restore, and the ~/.npm folder was empty when npm ci ran. Two dependencies were downloaded and stored in ~/.npm folder. They were the debug NPM dependency and its single transient dependency ms

$ npm ls
[email protected] /Users/gleb/git/snowball-npm-cache-example
└─┬ [email protected]
  └── [email protected]

After installation, the cache is saved under the precise key Linux-build-cache-node-modules-e9940409f0500326b7e54199eda4e7eefb0b839256d569cdb4979c7fff132c2c which includes the OS name, the cache name we specified in the YML file and the hash of the package-lock.json file using key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}.

The cache size

Before we continue, let's NOT change any dependencies, and just print the cached modules after the restore. We can just run the command du -d 0 -h ~/.npm on CI to print the cache folder size in human-readable format. We will do it after restoring the cache and after the NPM install.

- name: Cache node_modules
  uses: actions/cache@v2
    ...
- name: Print cached NPM modules size
  run: du -d 0 -h ~/.npm

- name: Install
  run: npm ci

- name: Print NPM modules after install
  run: du -d 0 -h ~/.npm

I tried listing all cached NPM modules using the npm-cache-list module, but failed to see any real results; it never printed debug and ls in its output. Seems npm cache is really missing list command.

The cached folder is 84 Kilobytes.

Printed NPM cache folder

Tip: You can see the zipped cache folder size in the Cache node_modules messages when it downloads the found cache too:

▶ Run actions/cache@v2
Received 18429 of 18429 (100.0%), 0.7 MBs/sec
Cache Size: ~0 MB (18429 B)
...
Cache restored from key: Linux-build-cache-node-modules-e9940409f0500326b7e54199eda4e7eefb0b839256d569cdb4979c7fff132c2c

The snowball

Now let's change the dependencies in our project. For example, we can replace debug module with morgan for some reason.

$ npm uninstall debug
removed 2 packages in 0.723s
found 0 vulnerabilities
$ npm i -D morgan
+ [email protected]
added 9 packages from 6 contributors and audited 9 packages in 1.493s
found 0 vulnerabilities

By changing the dependencies, we have modified both package.json and package-lock.json files

$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   package-lock.json
	modified:   package.json

no changes added to commit (use "git add" and/or "git commit -a")

What happens when we push the code? Well, the CI has the following cache right now

1	Linux-build-cache-node-modules-e9940409f0500326b7e54199eda4e7eefb0b839256d569cdb4979c7fff132c2c

We will push the updated package-lock.json file with a different hash. Thus that cache will NOT match the exact key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }} when restoring the cache. But it will match the prefix restore keys, because actions/core will find that cache when looking for ${{ runner.os }}-build-${{ env.cache-name }}- prefix. This is what the CI output shows:

Replaced debug with morgan dependency

The Cache node_modules shows that it has found the previous cache, restored it, ran NPM CI command, and then saved the new ~/.npm folder under the new full key which includes the new lock file's hash. The new cache folder has size of 384 Kilobytes by the way, and it includes both morgan and debug modules (and its dependencies)!

Print more information

An even better view of what is going on is available if you enable debugging output from the actions/cache steps by setting a secret in the GitHub repo

1	ACTIONS_STEP_DEBUG: true

Enable debug output from actions by setting project secret value

Let's remove all NPM dependencies from our project and run the CI again

$ npm uninstall morgan
removed 9 packages in 0.37s
found 0 vulnerabilities
$ npm ls
[email protected] /Users/gleb/git/snowball-npm-cache-example
└── (empty)

We now have no dependencies at all in our project

package.json

{
  "dependencies": {},
  "devDependencies": {}
}

Let's see how the cache behaves. Here is the partial output from the action run - it is pretty verbose!

Verbose output from the restore the cache step

Again, it shows that the previous cache was restored because it matched the lax restore key using prefix ${{ runner.os }}-build-${{ env.cache-name }}-. We never compact the cache, thus even removing the project's dependencies has no effect on the ~/.npm folder. We just roll over the previous cache under the new hash key. Every new dependency only grows the cache.

Prevent the cache snowball

You might think a few kilobytes of extra stuff carried over in the cache folder is no big deal. But remember - that cache keeps growing and growing, since you never delete anything there. After a while it can reach a magnificent number, just like I have locally

1 2	$ du -d 0 -h ~/.npm 8.4G /Users/gleb/.npm

This is a realistic problem for larger repositories, especially with Automated dependency updates configured - the NPM cache will keep all those versions around forever, snowballing the cache size to hundreds of megabytes and even gigabytes.

So what can we do to "reset" the cache and stop it from growing? Well, you might think that changing the cache name would do the trick. For example, you could change the GitHub action environment to use cache-name by adding -v2 there.

- name: Cache node_modules
  uses: actions/cache@v2
  env:
    cache-name: cache-node-modules-v2
  ...

Let's see how it works out.

Changed cache name still restored the previous cache

Wow, we again got an old cache restored because it matcher another restore key ${{ runner.os }}-build-.

So the only solution I found for preventing always increasing caches is ... 🥁 ... is using NO restore keys. When doing this, we should also change the cache name to drop the previous cache (since it might match an already saved cache because of using the unchanged hash of the package lock file)

- name: Cache node_modules
  uses: actions/cache@v2
  env:
    cache-name: cache-node-modules-v3
  with:
    path: '~/.npm'
    key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}

Tip: the disk usage utility du we have used exits with 1 if the folder is not found. Thus we should make it more robust before we can have NO ~/.npm folder by using du -d 0 -h ~/.npm || true command.

The CI runs and has nothing to cache, since there is no ~/.npm folder if there are no project dependencies to download, cache, and install

Caching using exact key only

Let's verify the cache is acting as expected. Let's install morgan dependency first.

1 2	$ npm i -S morgan + [email protected]

The CI runs and the cache is 288K - because it only has morgan dependency (and not morgan + debug like before).

Morgan dependency

Now let's remove morgan and go back to just having our debug dependency.

1
2
3

$ npm uninstall morgan
$ npm i -S debug
+ [email protected]

Removed morgan and added debug dependency

Since we use the exact hash, the previous cache was discarded and we have recreated it from scratch, having only the minimal 84K ~/.npm folder. From now on, every CI job that does not modify package.json will install this minimal, up-to-date cache folder.

The best solution

Remembering the cache key format is tiresome. You just want to install NPM dependencies and cache ~/.npm folder, right? So the simplest way is to use bahmutov/npm-install action I wrote. It will do precisely that this blog describes - uses the exact key to restore and save NPM cache, and it runs npm ci or yarn for you.

The CI file is now

.github/workflows/ci.yml

name: ci
on: push
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout 🛎
        uses: actions/checkout@v2

      - uses: bahmutov/npm-install@v1

      - name: Print NPM modules after install
        run: du -d 0 -h ~/.npm || true

The CI runs and saves the new cache (since the action sets the exact key to be yarn|npm-${platformAndArch}-${lockHash} which is slightly different from what we have used before in this example)

Using bahmutov/npm-install action to install NPM dependencies

Let's push another commit to verify the action works.

1
2
3

$ git commit --allow-empty -m "trigger ci"
[main 51bc8d1] trigger ci
$ git push

The CI runs, gets cache hit and quickly installs using JUST the right dependencies.

NPM cache hit before install

You might say that discarding the entire cache on package lock file is extreme. I say no - you will have a clean reinstall whenever the lock file changes, but how often do you modify the dependencies? I believe that you probably have a lot more commits that change the source files, but leave the dependencies intact. The commits that touch the dependencies will run longer, but all others commits would benefit from smaller cache restore.

Using bahmutov/npm-install abstracts all this away, I use it myself all the time. It is truly a simple solution.

More info

Blog post Do Not Let Cypress Cache Snowball on CI talks about Cypress-specific caching.

Read Cleaning Up Space on Development Machine post.

In the next blog post I will show how Cypress binaries snowball the same way during CI builds and how to solve it.

Better world by better software

Gleb Bahmutov PhD

Our planet 🌏 is in danger

Act today: what you can do