Failure Triage
AI-powered root-cause clustering of test failures with severity ratings and fix recommendations.
Failure Triage
After a test run completes, AegisRunner can automatically analyze failures using AI to cluster them by root cause, assign severity ratings, and provide actionable fix recommendations. Instead of reading through a flat list of errors, you get an organized triage report.
Triage runs automatically when a test run finishes with failures. No manual action is required.
How It Works
When a test run finishes and has at least one failure, triage is automatically triggered in the background.
The AI model receives all failed test cases (up to 100) including case name, suite name, page URL, and error messages. It identifies patterns and groups failures by root cause.
The result is a set of clusters, each with a name, root cause explanation, severity rating, and recommendation for how to fix the issue.
Open a test run with failures to see the triage panel. Click any cluster to filter the results table to just the affected test cases.
Triage Report Fields
| Field | Description |
|---|---|
| Cluster Name | A short label describing the failure group (e.g., “Missing Navigation Links”, “Timeout on Dynamic Content”). |
| Root Cause | AI-generated explanation of why these tests are failing. |
| Severity | One of: critical high medium low |
| Affected Cases | Number of test cases in this cluster. Click to filter the results table. |
| Recommendation | Suggested action to resolve the failures (e.g., “Add wait for element visibility before assertion”). |
| Flaky Pattern | Flag indicating whether the cluster appears to be caused by flaky/intermittent behavior rather than a real bug. |
Viewing the Triage Report
When you open a test run that has failures, the Failure Triage panel appears automatically above the results tab. It shows:
- A summary of all failures across the run
- Each cluster as a collapsible card with severity badge, affected count, root cause, and recommendation
- A filter button on each cluster to show only the affected test cases in the results table below
Use the cluster filter to focus on one group at a time. Fix the highest-severity cluster first, then re-run to verify the fix before moving to the next cluster.
Triage Status
Each test run has a triage status:
| Status | Meaning |
|---|---|
| none | Run had no failures, or triage hasn't been triggered yet. |
| pending | Triage is currently running. Usually completes in 3–10 seconds. |
| completed | Triage report is ready and available. |
| failed | Triage could not be generated (e.g., AI service unavailable). You can retry. |
Regenerating Triage
If you want a fresh analysis (for example after fixing some issues and wanting updated clusters), click the Regenerate button in the triage panel. This re-analyzes all failures in the run with a new AI call.
API Usage
You can also access triage programmatically:
Get Triage Report
GET /api/v1/test-runs/:runId/triage
Response:
{
"triage_status": "completed",
"triage_report": {
"clusters": [...],
"summary": "...",
"generated_at": "2026-03-03T12:00:00Z",
"model": "openrouter/model-name"
}
}
Trigger Triage
POST /api/v1/test-runs/:runId/triage
Response:
{
"message": "Triage triggered",
"triage_status": "pending"
}
Triage uses one AI call per request. The prompt includes up to 100 failed test cases. Runs with zero failures will not trigger triage.