Documentation
Test Execution

Failure Triage

AI-powered root-cause clustering of test failures with severity ratings and fix recommendations.

Failure Triage

When a test run completes with failures, AegisRunner automatically analyzes them using AI to cluster failures by root cause, assign severity ratings, detect flaky tests, and provide actionable fix recommendations.

Automatic: Triage runs automatically when a test run finishes with failures. No manual action required. Results appear within seconds.

How It Works

  1. Test run completes with failures — triage triggers automatically in the background
  2. AI receives failure data — up to 100 failed test cases with names, page URLs, error messages, step details, and screenshots
  3. Clustering — AI groups failures by root cause (e.g., "6 cart tests failed because cart is empty", "3 login tests failed because selectors changed")
  4. Report generated — stored on the test run, visible in the dashboard immediately

Triage Report

Each report contains:

Summary

A one-paragraph overview of all failures (e.g., "All 12 failures are caused by elements not being found in the DOM, primarily affecting cart interactions (6), login functionality (3), and category pages (3)").

Failure Clusters

Failures are grouped into clusters. Each cluster contains:

FieldDescription
Cluster NameShort label (e.g., "Cart page elements not found")
Severityhigh, medium, or low
Root CauseWhy these tests failed (e.g., "Tests assume cart items exist but cart is empty when tests run")
RecommendationHow to fix (e.g., "Add prerequisite step to add items to cart before testing cart interactions")
Flaky PatternWhether this looks like a flaky test (intermittent, timing-dependent) vs a genuine regression
Affected CasesList of test case IDs in this cluster — click to filter the results table

Example Report

{
  "summary": "All 12 failures caused by elements not found in DOM...",
  "clusters": [
    {
      "cluster_name": "Cart page elements not found",
      "severity": "high",
      "root_cause": "Tests assume cart items exist but cart is empty",
      "recommendation": "Add prerequisite step to add items to cart",
      "is_flaky_pattern": false,
      "affected_case_ids": ["019d...", "019d...", ...]
    },
    {
      "cluster_name": "Login form elements missing",
      "severity": "high",
      "root_cause": "Login page selectors changed or form did not render",
      "recommendation": "Verify login page selectors match current implementation",
      "is_flaky_pattern": false,
      "affected_case_ids": ["019d...", ...]
    }
  ]
}

Auto-Heal & Fix Suggestions

Beyond identifying root causes, AegisRunner can generate fix suggestions for failed tests:

  • Selector fixes — when an element is not found, suggests updated selectors based on the current DOM
  • Prerequisite steps — when a test assumes prior state (e.g., items in cart), suggests adding setup steps
  • Wait adjustments — when failures are timing-related, suggests adding explicit waits
  • Test data updates — when tests use outdated data, suggests updated values

Fix suggestions are generated from the triage report and the detailed failure data (including DOM snapshots and screenshots).

Flaky Test Detection

The triage system identifies flaky tests — tests that fail intermittently due to timing, network, or rendering issues rather than actual regressions. Flaky clusters are marked with is_flaky_pattern: true and include specific patterns:

  • Timeout errors that occur inconsistently
  • Element visibility assertions that depend on animation timing
  • Network-dependent assertions that fail under load
  • Tests that passed in the previous run but fail now with no code change

Viewing Triage Results

  1. Open any completed test run with failures
  2. The Failure Triage panel appears above the results table
  3. Clusters are sorted by severity (high → medium → low)
  4. Click a cluster to filter the results table to just the affected test cases
  5. Each cluster shows the count of affected tests, root cause, and recommendation

Manual Triage

You can also trigger triage manually from the API:

POST /api/v1/runs/:runId/triage
Authorization: Bearer YOUR_TOKEN

This re-runs the AI analysis, which is useful if the initial triage timed out or if you want a fresh analysis after updating test data.

Triage Status

StatusMeaning
noneNo failures to triage (all tests passed)
pendingTriage is running
completedTriage report is ready
failedAI analysis failed (timeout, provider error)

Plan Availability

Failure triage is available on Starter plans and above. The free plan shows basic failure lists without AI clustering.

Related Documentation

Need help?

Can't find what you're looking for? Our support team is here to help.