Fixing Flaky Automation Tests


Encountering intermittent failures in automation tests, such as Cypress and Playwright, can be a formidable challenge within your CI/CD pipelines. The unpredictability of these failures not only hampers the efficiency of development workflows but also undermines the reliability of the testing process itself.


Identifying the Challenge

Distinguishing intermittent test failures from genuine code issues is a formidable challenge when reviewing pull request pipeline logs. Over time, the normalisation of these occurrences can lead to a cycle of pipeline retries without addressing root causes, significantly affecting productivity and quality assurance.

Without a way to reliability detect flaky tests, the problem only gets worse over time as more flaky tests are added to the codebase.


Understanding the Complexity

The unpredictable nature of flaky tests, especially those prone to timing issues in real browser environments or when interacting with databases, adds a layer of complexity to their prevention and resolution. Their non-deterministic behavior, influenced by external factors, makes identifying and fixing these tests a particularly challenging task.

Strategic Solution

One effective strategy to mitigate the impact of flaky tests involves setting up a mechanism where the pipeline automatically runs at regular intervals from a clean state of the main branch. This methodical approach facilitates the consistent monitoring of build executions, allowing for the early detection of newly introduced unstable tests. Furthermore, it provides a reassurance mechanism by confirming the resolution of identified issues.

Implementing such a strategy not only aids in pinpointing the problematic tests but also fosters a culture of continuous improvement and reliability in the testing process. By systematically addressing the challenges posed by flaky tests, development teams can enhance the stability of their CI/CD pipelines, ultimately leading to more reliable and efficient development cycles.

High Level Steps

  • Setup Pipeline - Establish a dedicated build pipeline to run all tests from a clean state of the main branch at three-hour intervals. This frequency ensures timely detection of flaky tests and maintains the health of your codebase.
  • Fix Tests - Regularly review the build pipeline logs to identify and systematically address flaky tests. Measure progress by tracking the increase in successful (green) test runs over time. This metric serves as an indicator of improving test stability.
  • Monitor - Upon achieving test stability, continue using build logs to promptly identify any newly introduced flaky tests. Collaborate with the responsible developer to ensure immediate resolution, maintaining the integrity of your test suite.

Test Failure Report

Failed Build Runs