Recently a project of mine bahmutov/cypress-grep-example showed two flaky tests.
Are the tests really showing a problem with the application? Or are the tests themselves unreliable? Would these tests show failures if we run them 100 times in a row?
This is where the cypress-grep plugin comes in very handy. Just instal it and add to the support file
1 | npm i -D cypress-grep |
1 | // add to cypress/support/index.js |
We have the project with multiple spec files. The first flaky test "should cancel edits on escape" is located in spec file editing-spec.js. Let's run this test by itself 10 times.
1 | npx cypress run --spec cypress/integration/editing-spec.js \ |
The spec runs and the test we are grepping by title text "should cancel edits on escape" is repeated 10 times. The other tests are all pending.
So there is definitely something wrong with this test or the application. We can grab any screenshot - they all show the same failure: the first letter is missing from the title.
Often the application is not ready to receive the cy.type
command while still loading. In our case, this seems unlikely - after all, the failure happens in the 3rd todo item, not at the very first item. Maybe something is wrong with typing the characters? Or editing them? Let's make sure the 3 todo items created before each test are typed correctly.
1 | beforeEach(function () { |
We are using a custom command cy.createDefaultTodos
to enter the 3 todo items. We can add assertions checking the input field values right there.
1 | Cypress.Commands.add('createDefaultTodos', function () { |
And let's burn the test again.
The problem seems to be in typing the initial text, not in editing it afterwards. Cypress types pretty quickly, much faster than a normal human being. Maybe the application cannot keep up for some reason? Let's add a delay of 20ms after each character.
1 | const opts = { log: false, delay: 20 } |
Time to burn it to find out if we have fixed it.
1 | ... --env grep="should cancel edits on escape",burn=100 |
The tests seem to be stable. 100 tests pass.
Now we can decide if we want to move on, or keep digging into the application's code to find why the first letter is lost sometimes. There is one other way. We have slowed down every test that creates the default todo items by 400-500ms. Is this a good trade-off to make 1 or 2 tests stable?
Or is the test retries a better answer? In this instance I would prefer to get to the bottom of the problem and not use the test retries.
Bonus: video
See how I am burning a test in this short video below
Bonus 2: burn tests on CircleCI
Read the blog post Burn Cypress Tests on CircleCI
Bonus 3: burning new or changed specs
If you watch my presentation about slicing and dicing E2E tests, you will see that at Mercari US we run changed and new Cypress specs first before running the rest of the tests. We now also burn the changed and new specs, running them 5 times in a row. This prevents flaky tests from sneaking in.