Jan 26 2021

Split Long GitHub Action Workflow Into Parallel Cypress Jobs

An example of splitting a GitHub Actions testing workflow into several jobs.

This blog post shows how I cut the execution time of a GitHub Actions workflow from 9 minutes to 4 minutes by running E2E test jobs in parallel.

The initial workflow
Split the workflow: first attempt
Split the workflow: second attempt
Parallelization
Dominant spec
The final result

The initial workflow

In my repository bahmutov/cypress-examples the testing workflow has reached a tipping point: 9 minutes of running time.

The workflow takes nine minutes. That's too long

For me, any CI workflow longer than 3 minutes is too long, especially for a simple project like the cypress-examples. After all, it does only a few things serially:

installs dependencies
runs all Markdown specs with E2E tests
converts Markdown specs into static pages via Vuepress
runs the extracted JavaScript specs to run against the built Vuepress site

The steps that take the longest are highlighted in the workflow timings below.

The longest parts of the workflow

Can we optimize the workflow to speed it up? Can we run the Markdown tests and the JavaScript specs in parallel?

Note: the Markdown specs in this repository are running E2E Cypress tests from the Markdown files.

Split the workflow: first attempt

We should split the long workflow into several jobs. The first job should install the dependencies, then we can run tests in parallel. We can pass the installed dependencies using GitHub artifacts actions.

jobs:
  install:
    runs-on: ubuntu-20.04
    steps:
      # https://github.com/actions/checkout
      - name: Checkout 🛎️
        uses: actions/checkout@v2

      # only install dependencies
      # https://github.com/cypress-io/github-action
      - name: Install 📦
        uses: cypress-io/github-action@v2
        with:
          runTests: false

      ...
      - name: Save built folders 🆙
        uses: actions/upload-artifact@v2
        with:
          name: install-and-build
          path: |
            ~/.cache/Cypress
            node_modules
            docs
            public

The test job can download the artifacts:

test-markdown:
  runs-on: ubuntu-20.04
  needs: install
  steps:
    # https://github.com/actions/checkout
    - name: Checkout 🛎️
      uses: actions/checkout@v2

    - name: Download built folders ⏬
      uses: actions/download-artifact@v2
      with:
        name: install-and-build

The workflow runs ... and is super slow. The upload of node_modules, and Cypress binary folder, and the built folders takes forever.

Slow upload

Turns out, uploading and downloading folders with lots of files, like node_modules or ~/.npm is super slow, especially compared to actions/cache that can successfully restore cached NPM and Cypress binary dependencies much much faster.

Split the workflow: second attempt

Let's rethink our strategy. We have 3 types of files and folders in our test jobs.

the source files from the repository, like folders ./docs, ./src, and files like package.json. These files can be quickly downloaded using the actions/checkout action.
the dependencies folders like node_modules and ~/.cache/Cypress can be quickly installed and cached between runs using the cypress-io/github-action action. We can install these dependencies and skip running tests using an argument

- name: Install 📦
  uses: cypress-io/github-action@v2
  with:
    runTests: false

folders modified or created by the test job itself. In our case the test code creates a new folder ./public and modifies a few files in the ./docs folder. These two folders are small and can efficiently uploaded and downloaded using the actions/upload-artifact and actions/download-artifact actions.

Thus our workflow uses the different actions for different folders. The install job for example can do the following:

install:
  steps:
    - name: Checkout 🛎️
      uses: actions/checkout@v2

    # only install dependencies using
    # https://github.com/cypress-io/github-action
    # will restore / create folders
    #   ~/.npm
    #   ~/.cache/Cypress
    - name: Install 📦
      uses: cypress-io/github-action@v2
      with:
        runTests: false

    # will create folder ./public
    - name: Build site 🏗
      run: npm run docs:build

    # only pass the local folders we built / updated
    #   ./docs and ./public
    - name: Save built folders 🆙
      uses: actions/upload-artifact@v2
      with:
        name: built
        path: |
          docs
          public

The install job shows the time savings. Restoring the dependencies is fast - because most of the builds run with the same lock package files, thus avoiding the full re-install. Next, the link checks takes half a minute (we can optimize it later). Finally, saving the built folders takes only 5 seconds (the green arrow)

The install job

Let's take the job that runs the Markdown tests. It needs to check out files, download the built folders, install dependencies, and run tests.

test-markdown:
  needs: install
  steps:
    - name: Checkout 🛎️
      uses: actions/checkout@v2

    - name: Download built folders ⏬
      uses: actions/download-artifact@v2
      with:
        name: built

    # download cached ~/.npm and ~/.cache/Cypress
    # and install node_modules and run tests
    - name: Cypress tests 🧪
      uses: cypress-io/github-action@v2

The job only takes a few seconds to download the built folders, the rest is running the multiple Markdown spec files.

The test Markdown files job

There are 31 Markdown files that are executed serially, we will split them up later.

The test job that runs the exported JavaScript spec files similarly runs after the quick setup

The test JavaScript files job

Note: technically we could have passed the installed dependencies folders node_modules and ~/.cache/Cypress using actions/cache module instead of passing ~/.npm and ~/.cache/Cypress folders and re-installing them using cypress-io/github-action. But the time savings would be minimal, and the complexity of remembering the syntax are not worth it. Using cypress-io/github-action always wins by its simplicity.

Now that we know how to split workflows and pass folders from job to job, we can make the entire run faster using Cypress parallelization

Parallelization

Our current run executes the Markdown and JavaScript specs in parallel, using the common install job.

The workflow with jobs

First, let's determine if running tests on several machines is worth the trouble. The Cypress Dashboard (parallelization requires the Dashboard subscription) shows how adding more machines would affect the total test run durations.

Parallelization calculator

Wow, running the tests in parallel would be much faster!

Let's change the Markdown test job to run tests on four machines. We need to add to the job's definition strategy: matrix section and parallel: true parameter to the cypress-io/github-action command:

test-markdown:
  runs-on: ubuntu-20.04
  needs: install
  strategy:
    # when one test fails, DO NOT cancel the other
    # containers, because this will kill Cypress processes
    # leaving the Dashboard hanging ...
    # https://github.com/cypress-io/github-action/issues/48
    fail-fast: false
    matrix:
      # run 4 copies of the current job in parallel
      containers: [1, 2, 3, 4]
  ...
  - name: Cypress tests 🧪
      uses: cypress-io/github-action@v2
      with:
        record: true
        parallel: true
        group: 1. Markdown
      env:
        CYPRESS_RECORD_KEY: ${{ secrets.recordKey }}
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Tip: you can also check out the cypress-io/github-action parallel example.

We can add the same parameters to the test-js job. The actions UI shows the changed timing:

The parallelized workflow

Our total time went from 7 minutes to under 4 minutes. Not bad, let's see the individual testing jobs. There are 4 Markdown testing jobs, we can see that each one is around 90 seconds.

One of the four parallel test jobs

If we look at the Cypress Dashboard view of the specs by machine, we can see that the specs were distributed pretty well - every machine was utilized approximately the same.

Machine utilization by Markdown specs

Note: you can ignore the warning "3 specs errors". There are specs that do not run directly from the Markdown file and can only run after being converted into "plain" JavaScript specs.

Note 2: the Cypress Dashboard reports the run duration between the start of the first spec and the end of the last spec. The GitHub Action UI shows the total job duration, which includes the overhead of spinning the container, installing dependencies, passing the built folders, and finally starting the Cypress process. Thus the 50 seconds on the Dashboard become 1min and 30 seconds on GitHub.

Dominant spec

There is not additional benefit in adding more machines to our workflow matrices. The Markdown specs job is running in parallel with JavaScript specs job. The JavaScript specs job runs across four machines. We can see from the machine utilization reported by the Cypress Dashboard that the running time is already at its minimum. There is a single concatenated specs file that runs on one of the machines, and takes the entire duration. This concatenated specs is equivalent to clicking the "Run All Specs" button in the Test Runner. Because it takes longer than all other specs split across the other three machines, there is no sense in adding more machines. The additional machines will just finish faster, but the total job duration will still be equal to the concatenated spec's duration.

Machine utilization across the JavaScript specs

The final result

The final workflow has the common install job, 3 different test jobs, all passing the built folders and the dependencies using the approach I have described above. The last job in the workflow deploys the static site, it waits for all other jobs, and runs only on the default branch.

The final workflow

Better world by better software

Gleb Bahmutov PhD

Our planet 🌏 is in danger

Act today: what you can do