It Is OK to Get Stuck Sometimes

How we wasted three days due to missing "await"

Everyone gets stuck sometimes. I love and enjoy coding in JavaScript, have been doing it for a while, yet sometimes I commit simple mistakes. Most of the times, the linter or the tests catch them. But sometimes a harder to catch error slips through, stumbling me for a while. Recently, our open source test runner Cypress continuous integration broke due to Mac application code signing.

1
2
3
 electron-osx-sign Error executing file:
> Stdout:
> Stderr: .../node_modules/@ffmpeg-installer/darwin-x64/ffmpeg: No such file or directory

We just merged a very large v5.0 pull request, so I naturally suspected the problem to be due to this large code change. The squashed commit included Electron application upgrade, Node version change, and lots of code and dependency changes.

The funny thing was that it has built and signed the binary from the merge commit, and the CI has started failing on the next build. Even funnier was that I could build and code sign Mac application just fine on my laptop using exactly the same source code. The final piece of the puzzle (that actually should have been my clue) was that the missing file ".../node_modules/@ffmpeg-installer/darwin-x64/ffmpeg" was actually present! Hmm. I started my investigation in issue #8299. I spent probably two full days looking for the source of the problem, trying to update Electron builder, playing with Mac machine versions, etc.

After 2 days I brought in the big guns: the Cypress' author Brian Mann pair-programmed with me on this issue. We looked at each part of the code, ran the individual actions by themselves - it all seemed to work separately, yet fail together. Altogether we spent around 4-6 hours looking at the CI machine, learnt VIM (normally we code using VSCode) to be able to edit files directly on the CI machine, etc.

Nothing.

While looking at some unrelated thing we accidentally looked at the file scripts/after-pack-hook.js and Brian saw something suspicious. This code that copies files from each subfolder used async / await keywords:

1
2
3
4
5
6
7
8
9
10
11
12
console.log('copying node_modules to', outputFolder)

packages.forEach(async (packageNodeModules) => {
console.log('copying', packageNodeModules)

const sourceFolder = join(params.packager.info._appDir, packageNodeModules)
const destinationFolder = join(outputFolder, packageNodeModules)

await fs.copy(sourceFolder, destinationFolder)
})

console.log('all node_modules subfolders copied to', outputFolder)

The debug output on the CI machine looked correct:

1
2
3
4
5
6
7
copying node_modules to ...
copying folderA
copying folderB
copying folderC
copying folderD
...
all node_modules subfolders copied to ...

Do you see it? Exactly - the packages.forEach does NOT wait for the result of the callback function, even if it is marked async. Thus this code did NOT wait for the files to be copied, it just went on to the code signing part! The fix was a one line change in PR #8378.

1
2
- packages.forEach(async (packageNodeModules) => {
+ for await (const packageNodeModules of packages) {

Sigh.

So the code signing used to work accidentally, and once we upgraded the Electron framework and changed the dependencies, the copying slowed down enough for the code signing to not find the ffmpeg file when needed. Yet when we looked at the failed build, the copying has finished and ffmpeg was present, confusing us.

Last note: if we logged the folder copy operations after the fact we would have seen the problem, since we would see the log lines out of order:

1
2
3
4
5
6
7
copying node_modules to ...
all node_modules subfolders copied to ...
copied folderA
copied folderB
copied folderC
copied folderD
...