AI-Powered Test Generation
How AegisRunner writes tests for every scanned page: the three-pass pipeline, scenario families, plan-aware coverage, custom prompts, and BYOK.
AI Test Generation
After every scan, AegisRunner writes a Playwright test suite for each page it found. This guide explains how the generator works, what you can tune, and what to expect from the output on different plans and on different kinds of sites.
The pipeline
Test generation runs in three passes for each page:
| Pass | What it does | Code or AI? |
|---|---|---|
| 1. Scenario planner | Looks at the page's elements (forms, buttons, links, captured states) and decides what deserves a test. Outputs a scenario list. | Deterministic code. |
| 2. Materializer | Turns each scenario into concrete Playwright steps with real selectors from the page. | LLM, with a strict system prompt and tool-use grounding. |
| 3. Quality gate | Verifies every selector exists in the captured page snapshot. Drops or repairs steps that reference selectors the AI invented. Enforces minimum test count for thin pages. | Deterministic code. |
The third pass is the reason AegisRunner-generated tests rarely fail with "selector not found" on the first run. If a selector isn't grounded in the actual page DOM we captured, it doesn't make it to the suite.
Scenario families
The planner picks from seven scenario types, prioritized so the most valuable tests are written first within whatever budget the page has:
| Family | Priority | Examples |
|---|---|---|
| Form tests | Highest | Happy-path submit, empty submit (negative), field-limit boundaries. |
| Interactive buttons | High | Buttons that open modals or dropdowns, add-to-cart, multi-step flows. |
| Multi-step end-to-end | High | Chains together state transitions the scanner observed (page A → click → state B). |
| Form validation | High | Built from validation errors the scanner triggered during the scan. |
| Navigation | Medium | Header links, content links — asserts the destination page actually loads. |
| API endpoints | Medium | Smoke tests for any HTTP endpoints discovered through XHR/fetch during the scan. |
| Smoke fallback | Low | Every page gets at least one test — page loads, key element visible, mobile viewport renders, top link doesn't 404. Used as filler when richer scenarios aren't available. |
Test types
Each generated test is tagged with one of these types so you can filter, schedule, or quarantine by category:
| Type | What it asserts |
|---|---|
| e2e | End-to-end happy path — fill, submit, assert outcome on a different element than the one clicked. |
| negative | The other side of e2e — submit invalid input, expect validation errors. |
| boundary | Edge cases for inputs with constraints — too short, too long, special characters. |
| regression | Click → assert side-effect (cart badge updates, modal opens, list re-renders). |
| smoke | Lightweight: page loads, top-level links work, mobile viewport renders. |
Coverage on different plans
| Plan | Pages covered | Floor (min tests per scan) |
|---|---|---|
| Free (free scan, no signup) | 1 | 3 |
| Free (signed in) | Up to 10 | 3 |
| Starter | Up to 75 | — |
| Pro | Up to 500 | — |
| Business / Enterprise | Unlimited | — |
The "floor" is why a thin one-page site on the free tier still gets three tests rather than zero — the quality gate pads the suite with smoke tests so the generated email is never empty. On paid plans there's no floor; if a page genuinely has nothing actionable on it, it gets a single smoke test rather than padded filler.
The AI coverage banner
On the scan result page you'll see "AI tests cover X of Y discovered pages." A few reasons coverage might be partial:
- Free plan — only the entry page gets full AI tests. Coverage banner explains the limit and links to upgrade.
- Pages with nothing testable — pure decorative pages get the smoke fallback, not the full pipeline.
- Generation in flight — for large scans, AI runs after the scan finishes. Coverage rises over the next minute or two.
- AI error — rare, but if the LLM repeatedly fails the quality gate for one page, that page is skipped and logged. Re-trigger generation from the page detail view.
Quality rules the AI follows
Every prompt enforces a strict system policy:
- Never assert the element you just clicked. Confirms outcomes happen on a different element.
- Use only the selectors we provide. The page's actual elements are passed in as a tool-use payload; the model can't invent selectors.
- Prefer accessibility-based locators (role + visible name) over CSS class selectors. This makes tests more resilient to refactors and lets auto-heal recover from drift.
- Auth-gate signal handling. If a flow redirects to
/login, the test injects auto-login by default — unless the test is taggednegative,unauthenticated, or*-negative, in which case it's expected to land on the login page. - Search and form patterns. Search flows append a press-Enter step when no Search button is present (most modern SPAs submit on Enter). Login form intent emission filters submit candidates to actual form controls, not styled-link "Sign In" buttons.
Custom prompts
Add project-specific instructions in Project Config → AI Config → Custom prompt. Examples that work well:
- "Always assert visibility of the cart badge after add-to-cart."
- "Never assert on user-generated content — IDs and timestamps drift between runs."
- "For forms with phone inputs, use
+1-555-555-1234format." - "Tests on /checkout must include a Stripe test card: 4242 4242 4242 4242."
Custom prompts are appended to the system prompt for every test generated for that project.
Bring Your Own Key (BYOK)
Starter and above can plug in their own LLM API key so generation runs on their account, not ours.
- Supported providers: OpenAI, Anthropic, OpenRouter, DeepSeek, Z.AI, MiniMax.
- Keys are AES-256 encrypted at rest, never logged, never sent anywhere except the chosen provider.
- Set under Project Config → AI Config → BYOK.
- If your key fails (rate limit, billing) the system falls back to AegisRunner's pool so generation never silently stalls.
Provider pool and failover
By default, AegisRunner spreads requests across multiple LLM providers in parallel (Z.AI GLM, DeepSeek, MiniMax, MiMo Flash, OpenRouter Devstral, Chutes). Each provider gets its own circuit breaker. If one provider goes down or starts returning low-quality output, traffic shifts to the others without any visible interruption.
Re-generating tests for a page
If a page changed substantially and you want fresh tests:
- Open the scan result and drill into the page.
- Click Generate AI tests on the page detail panel.
- The new run replaces the old suite for that page.
Or re-scan the project — every scan re-generates tests, but only for the pages that come back in that scan's results.
Reviewing AI tests
Don't just trust the AI tests blindly. The Test Baseline Review page lets you mark each test as approved, needs-edit, or rejected before you start running them in CI. See Test Baseline Review.
Limitations
- JS-heavy single-page apps with hydration timing issues can still produce flaky tests on the first run. Use Baseline Replays for deterministic CI runs.
- Pages behind a login need a working login script under Project Config; otherwise the scanner only sees the login screen.
- OAuth flows generate tests up to the redirect, then can't continue past the third-party page. Use mock SSO in staging or use pre-auth cookies.
- CAPTCHA-protected pages won't generate meaningful tests. Add a bypass token to your test environment.
Related
- Test Baseline Review — sanity-check before running.
- Baseline Replays — deterministic CI runs.
- Managing Projects — login scripts, tokens, custom prompts.
- Debugging Failed Tests — what to do when a generated test fails.