Failure Triage
AI-powered root-cause clustering of test failures with severity ratings and fix recommendations.
Failure Triage
When a test run completes with failures, AegisRunner automatically analyzes them using AI to cluster failures by root cause, assign severity ratings, detect flaky tests, and provide actionable fix recommendations.
How It Works
- Test run completes with failures — triage triggers automatically in the background
- AI receives failure data — up to 100 failed test cases with names, page URLs, error messages, step details, and screenshots
- Clustering — AI groups failures by root cause (e.g., "6 cart tests failed because cart is empty", "3 login tests failed because selectors changed")
- Report generated — stored on the test run, visible in the dashboard immediately
Triage Report
Each report contains:
Summary
A one-paragraph overview of all failures (e.g., "All 12 failures are caused by elements not being found in the DOM, primarily affecting cart interactions (6), login functionality (3), and category pages (3)").
Failure Clusters
Failures are grouped into clusters. Each cluster contains:
| Field | Description |
|---|---|
| Cluster Name | Short label (e.g., "Cart page elements not found") |
| Severity | high, medium, or low |
| Root Cause | Why these tests failed (e.g., "Tests assume cart items exist but cart is empty when tests run") |
| Recommendation | How to fix (e.g., "Add prerequisite step to add items to cart before testing cart interactions") |
| Flaky Pattern | Whether this looks like a flaky test (intermittent, timing-dependent) vs a genuine regression |
| Affected Cases | List of test case IDs in this cluster — click to filter the results table |
Example Report
{
"summary": "All 12 failures caused by elements not found in DOM...",
"clusters": [
{
"cluster_name": "Cart page elements not found",
"severity": "high",
"root_cause": "Tests assume cart items exist but cart is empty",
"recommendation": "Add prerequisite step to add items to cart",
"is_flaky_pattern": false,
"affected_case_ids": ["019d...", "019d...", ...]
},
{
"cluster_name": "Login form elements missing",
"severity": "high",
"root_cause": "Login page selectors changed or form did not render",
"recommendation": "Verify login page selectors match current implementation",
"is_flaky_pattern": false,
"affected_case_ids": ["019d...", ...]
}
]
}
Auto-Heal & Fix Suggestions
Beyond identifying root causes, AegisRunner can generate fix suggestions for failed tests:
- Selector fixes — when an element is not found, suggests updated selectors based on the current DOM
- Prerequisite steps — when a test assumes prior state (e.g., items in cart), suggests adding setup steps
- Wait adjustments — when failures are timing-related, suggests adding explicit waits
- Test data updates — when tests use outdated data, suggests updated values
Fix suggestions are generated from the triage report and the detailed failure data (including DOM snapshots and screenshots).
Flaky Test Detection
The triage system identifies flaky tests — tests that fail intermittently due to timing, network, or rendering issues rather than actual regressions. Flaky clusters are marked with is_flaky_pattern: true and include specific patterns:
- Timeout errors that occur inconsistently
- Element visibility assertions that depend on animation timing
- Network-dependent assertions that fail under load
- Tests that passed in the previous run but fail now with no code change
Viewing Triage Results
- Open any completed test run with failures
- The Failure Triage panel appears above the results table
- Clusters are sorted by severity (high → medium → low)
- Click a cluster to filter the results table to just the affected test cases
- Each cluster shows the count of affected tests, root cause, and recommendation
Manual Triage
You can also trigger triage manually from the API:
POST /api/v1/runs/:runId/triage
Authorization: Bearer YOUR_TOKEN
This re-runs the AI analysis, which is useful if the initial triage timed out or if you want a fresh analysis after updating test data.
Triage Status
| Status | Meaning |
|---|---|
none | No failures to triage (all tests passed) |
pending | Triage is running |
completed | Triage report is ready |
failed | AI analysis failed (timeout, provider error) |
Plan Availability
Failure triage is available on Starter plans and above. The free plan shows basic failure lists without AI clustering.
Related Documentation
- Debugging Failed Tests — Manual debugging techniques
- Smart Test Selection — Run only recently failed tests
- Running Tests — Test execution overview
- Issue Tracking — Auto-create Jira/GitHub issues from failures