Test Tag Suggestions Using AI

Suggest the appropriate test tag based on the pull request title and text.

In my previous blog post Pick E2E Tests To Run Using AI Summaries I picked specs to run using an intermediate text summaries. In this blog post I will show a simpler and cheap way of suggesting the end-to-end test tag for each pull request.

🎁 You can find the example application source code in the repo bahmutov/pick-test-tag-ai.

The application and tests

We have a simple TodoMVC app with multiple end-to-end Cypress tests. I can see all tests and how they are tagged using the find-cypress-specs utility.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
$ npx find-cypress-specs --names
cypress/e2e/app-spec.js (15 tests)
└─ TodoMVC - React
├─ adds 4 todos [@smoke, @add]
├─ When page is initially opened
│ └─ should focus on the todo input field
├─ No Todos
│ └─ should hide #main and #footer [@smoke]
├─ New Todo [@add]
│ ├─ should allow me to add todo items
│ ├─ adds items
│ ├─ should clear text input field when an item is added
│ ├─ should append new items to the bottom of the list
│ ├─ should trim text input
│ └─ should show #main and #footer when items added
├─ Item
│ ├─ should allow me to mark items as complete
│ ├─ should allow me to un-mark items as complete
│ └─ should allow me to edit an item
└─ Clear completed button
├─ should display the correct text
├─ should remove completed items when clicked [@smoke]
└─ should be hidden when there are no items that are completed

cypress/e2e/completed-spec.js (3 tests)
└─ TodoMVC - React [@complete]
└─ Mark all as completed
├─ should allow me to mark all items as completed
├─ should allow me to clear the complete state of all items
└─ complete all checkbox should update state when items are completed / cleared [@smoke]

cypress/e2e/counter-spec.js (2 tests)
└─ TodoMVC - React [@add, @smoke]
└─ Counter
├─ should not exist without items
└─ should display the current number of todo items

cypress/e2e/editing-spec.js (5 tests)
└─ TodoMVC - React [@edit]
└─ Editing
├─ should hide other controls when editing
├─ should save edits on blur [@smoke]
├─ should trim entered text
├─ should remove the item if an empty text string was entered
└─ should cancel edits on escape

cypress/e2e/persistence-spec.js (1 test)
└─ TodoMVC - React [@persistence]
└─ Persistence
└─ should persist its data [@smoke]

cypress/e2e/routing-spec.js (5 tests)
└─ TodoMVC - React [@routing]
└─ Routing
├─ should allow me to display active items
├─ should respect the back button [@smoke]
├─ should allow me to display completed items
├─ should allow me to display all items @smoke
└─ should highlight the currently applied filter

found 6 specs (31 tests)

We have 6 specs with 31 tests grouped into tags

Test tags

There are 6 different tags implemented using @bahmutov/cy-grep plugin.

1
2
3
4
5
6
7
8
9
$ npx find-cypress-specs --tags
Tag Tests
------------ -----
@add 9
@complete 3
@edit 5
@persistence 1
@routing 5
@smoke 10

The test tags are type-checked, see the blog post Type Check Your Test Tags for details. The test tags can be found in the file cypress/support/index.d.ts

cypress/support/index.d.ts
1
2
3
4
5
6
7
8
9
10
/**
* The only allowed test tags in this project
*/
type AllowedTag =
| '@smoke'
| '@add'
| '@complete'
| '@edit'
| '@routing'
| '@persistence'

Picking tests to run

Imagine someone unfamiliar with the project's tests opens a pull request. Which tests should we run? Ideally, we would run all tests, but that might be slow. We could run a few tests across all features: the tests tagged @smoke. Or the author of the pull request could ask someone familiar with the tests to advise. If someone asked me "which test tag is appropriate for this pull request?", I would look at the pull request title and text description to see if I can determine what user-facing changes the code change has.

Here is an empty commit and pull request:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
bahmutov at QPW7RRQDVW  ~/git/pick-test-tag-ai on main
$ gb demo-pr
Switched to a new branch 'demo-pr'
bahmutov at QPW7RRQDVW ~/git/pick-test-tag-ai on demo-pr
$ gempty "add localStorage wrapper"
[demo-pr cca7b87] add localStorage wrapper
bahmutov at QPW7RRQDVW ~/git/pick-test-tag-ai on demo-pr
$ gh pr create
? Where should we push the 'demo-pr' branch? bahmutov/pick-test-tag-ai

Creating pull request for demo-pr into main in bahmutov/pick-test-tag-ai

? Title (required) add localStorage wrapper
? Body <Received>
? What's next? Submit
remote:
remote:
To github.com:bahmutov/pick-test-tag-ai.git
* [new branch] HEAD -> demo-pr
branch 'demo-pr' set up to track 'origin/demo-pr'.
https://github.com/bahmutov/pick-test-tag-ai/pull/7

I wrote the following pull request body

1
Refactor loading todos on page load.

Let's look at the pull request #7. Notice the automatic comment I recommend running tests tagged @persistence. That is the AI model automatically suggesting the tag @persistence based on the PR title + pull request body text.

AI suggests running @persistence test tag

The recommendation makes sense: we do want to run these tests, just look at the title:

1
2
3
4
cypress/e2e/persistence-spec.js (1 test)
└─ TodoMVC - React [@persistence]
└─ Persistence
└─ should persist its data [@smoke]

Here is how it works.

The GitHub Actions workflow

Each time a new pull request is opened, the following workflow .github/workflows/pr-opened.yml grabs its title and the first 1000 characters of its body text.

.github/workflows/pr-opened.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
name: PR opened
on:
pull_request:
types: [opened]
permissions:
contents: write
pull-requests: write
jobs:
find-test-tag:
name: Find the applicable test tag
runs-on: ubuntu-24.04
steps:
- name: Checkout 🛎
uses: actions/checkout@v4

- name: Install dependencies 📦
# https://github.com/cypress-io/github-action
uses: cypress-io/github-action@v6
with:
runTests: false

- name: Determine the test tag 🏷️
id: find_test_tag
run: |
TAG=$(node ask.mjs)
echo "TAG=$TAG" >> $GITHUB_OUTPUT
env:
OPEN_AI_API_KEY: ${{ secrets.OPEN_AI_API_KEY }}
OPEN_AI_BASE_URL: ${{ secrets.OPEN_AI_BASE_URL }}
CODE_CHANGES: "${{ github.event.pull_request.title }}\n\n${{ github.event.pull_request.body }}"

- name: Print the determined tag 🏷️
run: |
echo "The determined test tag is: ${{ steps.find_test_tag.outputs.TAG }}" >> $GITHUB_STEP_SUMMARY

- name: Comment on PR with the found tag 💬
# https://github.com/peter-evans/create-or-update-comment
uses: peter-evans/create-or-update-comment@v4
with:
token: ${{ secrets.GH_PERSONAL_TOKEN }}
repository: ${{ github.event.repository.full_name }}
issue-number: ${{ github.event.pull_request.number }}
body: |
I recommend running tests tagged **${{ steps.find_test_tag.outputs.TAG }}**

The most important part is executing the ask.mjs script:

1
2
3
4
5
6
7
8
9
- name: Determine the test tag 🏷️
id: find_test_tag
run: |
TAG=$(node ask.mjs)
echo "TAG=$TAG" >> $GITHUB_OUTPUT
env:
OPEN_AI_API_KEY: ${{ secrets.OPEN_AI_API_KEY }}
OPEN_AI_BASE_URL: ${{ secrets.OPEN_AI_BASE_URL }}
CODE_CHANGES: "${{ github.event.pull_request.title }}\n\n${{ github.event.pull_request.body }}"

Here is the script that uses OpenAI sdk to get one word answer.

ask.mjs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import OpenAI from 'openai'

const client = new OpenAI({
apiKey: process.env['OPEN_AI_API_KEY'],
baseURL: process.env['OPEN_AI_BASE_URL'],
})

async function ask(instructions, input) {
const response = await client.responses.create({
// https://platform.openai.com/docs/models
model: 'gpt-4.1',
instructions,
input,
})
return response.output_text
}

const instructions = `
Give the following end-to-end test tags:
- @smoke a few tests that go through various features of the application
- @add tests go through creating new todos
- @complete tests are creating todos and then marking then complete and incomplete
- @edit tests edit text for existing todos
- @routing tests check if the app can show screens of completed and active todos
- @persistence tests check how todos are saved in the browser and loaded

Determine which test tag is applicable to the following code changes.

Response with the test tag by itself and nothing else.
If no test tag is applicable, return "@smoke".
`

const input = process.env['CODE_CHANGES']
if (!input) {
throw new Error('CODE_CHANGES environment variable is required')
}

// output logging into error stream
console.error('Asking OpenAI for test tags...')
console.error(input)

const answer = await ask(instructions, input)

console.log(answer)

The test tag descriptions are manual and can be expanded if needed to better describe the tests under the tag.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const instructions = `
Give the following end-to-end test tags:
- @smoke a few tests that go through various features of the application
- @add tests go through creating new todos
- @complete tests are creating todos and then marking then complete and incomplete
- @edit tests edit text for existing todos
- @routing tests check if the app can show screens of completed and active todos
- @persistence tests check how todos are saved in the browser and loaded

Determine which test tag is applicable to the following code changes.

Response with the test tag by itself and nothing else.
If no test tag is applicable, return "@smoke".
`

I tried other OpenAI models, they all worked pretty much the same. In a sense, our problem is very simple: pick the best matching text from a very limited list of available tags. LLMs seem to match the synonyms and word forms pretty well. Let's see if a pull request with the title "Changed the input element" matches the "@add tests go through creating new todos" text. Here is the pull request #8

AI suggests running @add test tag

Nice, that is the tag for the tests that probably cover the changes to the "Todo" item input implementation:

Test tag on demand

We can determine the test tag when the user asks for it, instead of computing it automatically when the pull request is opened. I can use the peter-evans/slash-command-dispatch action to trigger the "find the test tag" workflow when the user enters the /ai comment.

.github/workflows/dispatch.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
name: dispatch
on:
issue_comment:
types: [created]
permissions:
contents: write
pull-requests: write
jobs:
dispatch:
runs-on: ubuntu-24.04
steps:
- name: Slash Command Dispatch
# https://github.com/peter-evans/slash-command-dispatch
uses: peter-evans/slash-command-dispatch@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
reaction-token: ${{ secrets.GITHUB_TOKEN }}
permission: none
issue-type: pull-request
commands: |
ai

The only command "ai" in the dispatch above triggers the following workflow .github/workflows/ai-command.yml

.github/workflows/ai-command.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
name: ai-command

on:
repository_dispatch:
types: [ai-command]

jobs:
find-test-tag:
runs-on: ubuntu-24.04
steps:
- name: Dump the client payload body
env:
PAYLOAD_CONTEXT: ${{ toJson(github.event.client_payload.pull_request.body) }}
run: echo "$PAYLOAD_CONTEXT"

- name: Print repo and comment
run: |
echo "Title of the pull request: ${{ github.event.client_payload.pull_request.title }}"
echo "Repository: ${{ github.event.client_payload.github.payload.repository.full_name }}"
echo "Issue number: ${{ github.event.client_payload.github.payload.issue.number }}"

- name: Checkout 🛎
uses: actions/checkout@v4

- name: Install dependencies 📦
# https://github.com/cypress-io/github-action
uses: cypress-io/github-action@v6
with:
runTests: false

- name: Determine the test tag 🏷️
id: find_test_tag
run: |
TAG=$(node ask.mjs)
echo "TAG=$TAG" >> $GITHUB_OUTPUT
env:
OPEN_AI_API_KEY: ${{ secrets.OPEN_AI_API_KEY }}
OPEN_AI_BASE_URL: ${{ secrets.OPEN_AI_BASE_URL }}
CODE_CHANGES: "${{ github.event.client_payload.pull_request.title }}\n\n${{ github.event.client_payload.pull_request.body }}"

- name: Print the determined tag 🏷️
run: |
echo "The recommended test tag is: ${{ steps.find_test_tag.outputs.TAG }}" >> $GITHUB_STEP_SUMMARY

- name: Write tag back into the comment 💬
# https://github.com/peter-evans/create-or-update-comment
uses: peter-evans/create-or-update-comment@v4
with:
token: ${{ secrets.GH_PERSONAL_TOKEN }}
repository: ${{ github.event.client_payload.github.payload.repository.full_name }}
issue-number: ${{ github.event.client_payload.github.payload.issue.number }}
body: |
The recommended test tag is: **${{ steps.find_test_tag.outputs.TAG }}**

AI posts its answer as another comment

Nice.

Cost

Test tags list with a summary is small and does not change often. Pull request title plus text body are limited to 1000 characters, so are very small. Matching the PR text to the test tags should be quick and cheap AI operation. Just to see how many tokens we used

1
2
3
4
5
6
7
8
9
10
11
12
13
async function ask(instructions, input) {
const response = await client.responses.create({
// https://platform.openai.com/docs/models
model: 'gpt-4.1',
instructions,
input,
})

console.error('response usage:')
console.error(response.usage)

return response.output_text
}

Calling this code with "changed the input element" code change produces:

1
2
3
4
5
6
7
8
9
10
Asking OpenAI for test tags...
changed the input element
{
input_tokens: 154,
input_tokens_details: { audio_tokens: null, cached_tokens: 0, text_tokens: null },
output_tokens: 3,
output_tokens_details: { reasoning_tokens: 0, text_tokens: null },
total_tokens: 157
}
@add

So we used 154 input and 3 output tokens. For the full picture, https://openai.com/api/pricing/ has Input: $2.00 / 1M tokens and Output: $8.00 / 1M tokens bringing out query cost to $0.0003

Of course, a low cost for our queries is only a part of the environment costs incurred by the AI data centers.