TLDR: when restoring NPM cache on your continuous integration service use the exact lock file hash, do not use lax partial restore cache keys.
The NPM caching on CI
Imagine we have a Node project that we test on continuous integration server. Take bahmutov/snowball-npm-cache-example repository for example. It has only a single production dependency - my favorite debug module; I use it to log all the things the right way.
1 | $ npm i -S debug |
Let's say we want to test our project on continuous integration service, like GitHub Actions. We need to check out the source code and install dependencies before we can run any tests. Here is our initial CI workflow file (copied almost verbatim from Example CI configs).
1 | name: ci |
We are using actions/cache official action, and the above syntax comes straight from the page Caching dependencies to speed up workflows documentation. Let me explain it.
After the code is checked out, the actions/cache step takes over. It uses the name cache-node-modules that we have picked. While not necessary right now, it will become handy later.
The action will cache the folder ~/.npm. This is the folder where NPM caches downloaded NPM modules. We can see this folder's name locally by asking NPM
1 | $ npm config get cache |
The action first tries to restore this folder ~/.npm on CI by looking up caches stored for this project. Every cache has a key - a name for the cache. This is where the key and the restore_keys come into play. First, the action computes them. The key is the most precise cache name, it uses the OS name, the cache-name string and the hash of the lock file.
1 | key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }} |
Every time we install a dependency locally, the package-lock.json is recreated. Thus the hash of this file changes every time the package lock file changes.
The restore-keys are fallbacks. If the cache with the exact key is not found, the actions/cache tries to find a cache where the name starts with the given key. Maybe there is already a cache that starts with ${{ runner.os }}-build-${{ env.cache-name }}-? This would be the case if there was a previously saved cache with different package-lock.json file. If not found, then actions/cache tries to find a saved cache that starts with ${{ runner.os }}-build- - so even the specified cache-name does not matter. Finally, if nothing has been found, the actions/cache tries to restore ${{ runner.os }}-build- prefix. If that fails, it tries to restore cache with ${{ runner.os }}- prefix. At this point you might be asking yourself "wait, it will restore some random cache and go from there?"
Yes. The restore keys are very lax, and a pretty random cache can be restored, giving you some random ~/.npm folder before npm ci runs. That is not a problem. Yet. Let's see the GitHub Action messages the first time our project runs.

There were no previously saved caches (this was the very first CI run for this repository). Thus there was nothing to restore, and the ~/.npm folder was empty when npm ci ran. Two dependencies were downloaded and stored in ~/.npm folder. They were the debug NPM dependency and its single transient dependency ms
1 | $ npm ls |
After installation, the cache is saved under the precise key Linux-build-cache-node-modules-e9940409f0500326b7e54199eda4e7eefb0b839256d569cdb4979c7fff132c2c which includes the OS name, the cache name we specified in the YML file and the hash of the package-lock.json file using key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}.
The cache size
Before we continue, let's NOT change any dependencies, and just print the cached modules after the restore. We can just run the command du -d 0 -h ~/.npm on CI to print the cache folder size in human-readable format. We will do it after restoring the cache and after the NPM install.
1 | - name: Cache node_modules |
I tried listing all cached NPM modules using the npm-cache-list module, but failed to see any real results; it never printed
debugandlsin its output. Seemsnpm cacheis really missinglistcommand.
The cached folder is 84 Kilobytes.

Tip: You can see the zipped cache folder size in the Cache node_modules messages when it downloads the found cache too:
1 | ▶ Run actions/cache@v2 |
The snowball
Now let's change the dependencies in our project. For example, we can replace debug module with morgan for some reason.
1 | $ npm uninstall debug |
By changing the dependencies, we have modified both package.json and package-lock.json files
1 | $ git status |
What happens when we push the code? Well, the CI has the following cache right now
1 | Linux-build-cache-node-modules-e9940409f0500326b7e54199eda4e7eefb0b839256d569cdb4979c7fff132c2c |
We will push the updated package-lock.json file with a different hash. Thus that cache will NOT match the exact key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }} when restoring the cache. But it will match the prefix restore keys, because actions/core will find that cache when looking for ${{ runner.os }}-build-${{ env.cache-name }}- prefix. This is what the CI output shows:

The Cache node_modules shows that it has found the previous cache, restored it, ran NPM CI command, and then saved the new ~/.npm folder under the new full key which includes the new lock file's hash. The new cache folder has size of 384 Kilobytes by the way, and it includes both morgan and debug modules (and its dependencies)!
Print more information
An even better view of what is going on is available if you enable debugging output from the actions/cache steps by setting a secret in the GitHub repo
1 | ACTIONS_STEP_DEBUG: true |

Let's remove all NPM dependencies from our project and run the CI again
1 | $ npm uninstall morgan |
We now have no dependencies at all in our project
1 | { |
Let's see how the cache behaves. Here is the partial output from the action run - it is pretty verbose!

Again, it shows that the previous cache was restored because it matched the lax restore key using prefix ${{ runner.os }}-build-${{ env.cache-name }}-. We never compact the cache, thus even removing the project's dependencies has no effect on the ~/.npm folder. We just roll over the previous cache under the new hash key. Every new dependency only grows the cache.
Prevent the cache snowball
You might think a few kilobytes of extra stuff carried over in the cache folder is no big deal. But remember - that cache keeps growing and growing, since you never delete anything there. After a while it can reach a magnificent number, just like I have locally
1 | $ du -d 0 -h ~/.npm |
This is a realistic problem for larger repositories, especially with Automated dependency updates configured - the NPM cache will keep all those versions around forever, snowballing the cache size to hundreds of megabytes and even gigabytes.
So what can we do to "reset" the cache and stop it from growing? Well, you might think that changing the cache name would do the trick. For example, you could change the GitHub action environment to use cache-name by adding -v2 there.
1 | - name: Cache node_modules |
Let's see how it works out.

Wow, we again got an old cache restored because it matcher another restore key ${{ runner.os }}-build-.
So the only solution I found for preventing always increasing caches is ... 🥁 ... is using NO restore keys. When doing this, we should also change the cache name to drop the previous cache (since it might match an already saved cache because of using the unchanged hash of the package lock file)
1 | - name: Cache node_modules |
Tip: the disk usage utility du we have used exits with 1 if the folder is not found. Thus we should make it more robust before we can have NO ~/.npm folder by using du -d 0 -h ~/.npm || true command.
The CI runs and has nothing to cache, since there is no ~/.npm folder if there are no project dependencies to download, cache, and install

Let's verify the cache is acting as expected. Let's install morgan dependency first.
1 | $ npm i -S morgan |
The CI runs and the cache is 288K - because it only has morgan dependency (and not morgan + debug like before).

Now let's remove morgan and go back to just having our debug dependency.
1 | $ npm uninstall morgan |

Since we use the exact hash, the previous cache was discarded and we have recreated it from scratch, having only the minimal 84K ~/.npm folder. From now on, every CI job that does not modify package.json will install this minimal, up-to-date cache folder.
The best solution
Remembering the cache key format is tiresome. You just want to install NPM dependencies and cache ~/.npm folder, right? So the simplest way is to use bahmutov/npm-install action I wrote. It will do precisely that this blog describes - uses the exact key to restore and save NPM cache, and it runs npm ci or yarn for you.
The CI file is now
1 | name: ci |
The CI runs and saves the new cache (since the action sets the exact key to be yarn|npm-${platformAndArch}-${lockHash} which is slightly different from what we have used before in this example)

Let's push another commit to verify the action works.
1 | $ git commit --allow-empty -m "trigger ci" |
The CI runs, gets cache hit and quickly installs using JUST the right dependencies.

You might say that discarding the entire cache on package lock file is extreme. I say no - you will have a clean reinstall whenever the lock file changes, but how often do you modify the dependencies? I believe that you probably have a lot more commits that change the source files, but leave the dependencies intact. The commits that touch the dependencies will run longer, but all others commits would benefit from smaller cache restore.
Using bahmutov/npm-install abstracts all this away, I use it myself all the time. It is truly a simple solution.
More info
Blog post Do Not Let Cypress Cache Snowball on CI talks about Cypress-specific caching.
Read Cleaning Up Space on Development Machine post.
In the next blog post I will show how Cypress binaries snowball the same way during CI builds and how to solve it.