Documentation
Test Generation

AI-Powered Test Generation

How AegisRunner writes tests for every scanned page: the three-pass pipeline, scenario families, plan-aware coverage, custom prompts, and BYOK.

AI Test Generation

After every scan, AegisRunner writes a Playwright test suite for each page it found. This guide explains how the generator works, what you can tune, and what to expect from the output on different plans and on different kinds of sites.

The pipeline

Test generation runs in three passes for each page:

PassWhat it doesCode or AI?
1. Scenario plannerLooks at the page's elements (forms, buttons, links, captured states) and decides what deserves a test. Outputs a scenario list.Deterministic code.
2. MaterializerTurns each scenario into concrete Playwright steps with real selectors from the page.LLM, with a strict system prompt and tool-use grounding.
3. Quality gateVerifies every selector exists in the captured page snapshot. Drops or repairs steps that reference selectors the AI invented. Enforces minimum test count for thin pages.Deterministic code.

The third pass is the reason AegisRunner-generated tests rarely fail with "selector not found" on the first run. If a selector isn't grounded in the actual page DOM we captured, it doesn't make it to the suite.

Scenario families

The planner picks from seven scenario types, prioritized so the most valuable tests are written first within whatever budget the page has:

FamilyPriorityExamples
Form testsHighestHappy-path submit, empty submit (negative), field-limit boundaries.
Interactive buttonsHighButtons that open modals or dropdowns, add-to-cart, multi-step flows.
Multi-step end-to-endHighChains together state transitions the scanner observed (page A → click → state B).
Form validationHighBuilt from validation errors the scanner triggered during the scan.
NavigationMediumHeader links, content links — asserts the destination page actually loads.
API endpointsMediumSmoke tests for any HTTP endpoints discovered through XHR/fetch during the scan.
Smoke fallbackLowEvery page gets at least one test — page loads, key element visible, mobile viewport renders, top link doesn't 404. Used as filler when richer scenarios aren't available.

Test types

Each generated test is tagged with one of these types so you can filter, schedule, or quarantine by category:

TypeWhat it asserts
e2eEnd-to-end happy path — fill, submit, assert outcome on a different element than the one clicked.
negativeThe other side of e2e — submit invalid input, expect validation errors.
boundaryEdge cases for inputs with constraints — too short, too long, special characters.
regressionClick → assert side-effect (cart badge updates, modal opens, list re-renders).
smokeLightweight: page loads, top-level links work, mobile viewport renders.

Coverage on different plans

PlanPages coveredFloor (min tests per scan)
Free (free scan, no signup)13
Free (signed in)Up to 103
StarterUp to 75
ProUp to 500
Business / EnterpriseUnlimited

The "floor" is why a thin one-page site on the free tier still gets three tests rather than zero — the quality gate pads the suite with smoke tests so the generated email is never empty. On paid plans there's no floor; if a page genuinely has nothing actionable on it, it gets a single smoke test rather than padded filler.

The AI coverage banner

On the scan result page you'll see "AI tests cover X of Y discovered pages." A few reasons coverage might be partial:

  • Free plan — only the entry page gets full AI tests. Coverage banner explains the limit and links to upgrade.
  • Pages with nothing testable — pure decorative pages get the smoke fallback, not the full pipeline.
  • Generation in flight — for large scans, AI runs after the scan finishes. Coverage rises over the next minute or two.
  • AI error — rare, but if the LLM repeatedly fails the quality gate for one page, that page is skipped and logged. Re-trigger generation from the page detail view.

Quality rules the AI follows

Every prompt enforces a strict system policy:

  • Never assert the element you just clicked. Confirms outcomes happen on a different element.
  • Use only the selectors we provide. The page's actual elements are passed in as a tool-use payload; the model can't invent selectors.
  • Prefer accessibility-based locators (role + visible name) over CSS class selectors. This makes tests more resilient to refactors and lets auto-heal recover from drift.
  • Auth-gate signal handling. If a flow redirects to /login, the test injects auto-login by default — unless the test is tagged negative, unauthenticated, or *-negative, in which case it's expected to land on the login page.
  • Search and form patterns. Search flows append a press-Enter step when no Search button is present (most modern SPAs submit on Enter). Login form intent emission filters submit candidates to actual form controls, not styled-link "Sign In" buttons.

Custom prompts

Add project-specific instructions in Project Config → AI Config → Custom prompt. Examples that work well:

  • "Always assert visibility of the cart badge after add-to-cart."
  • "Never assert on user-generated content — IDs and timestamps drift between runs."
  • "For forms with phone inputs, use +1-555-555-1234 format."
  • "Tests on /checkout must include a Stripe test card: 4242 4242 4242 4242."

Custom prompts are appended to the system prompt for every test generated for that project.

Bring Your Own Key (BYOK)

Starter and above can plug in their own LLM API key so generation runs on their account, not ours.

  • Supported providers: OpenAI, Anthropic, OpenRouter, DeepSeek, Z.AI, MiniMax.
  • Keys are AES-256 encrypted at rest, never logged, never sent anywhere except the chosen provider.
  • Set under Project Config → AI Config → BYOK.
  • If your key fails (rate limit, billing) the system falls back to AegisRunner's pool so generation never silently stalls.

Provider pool and failover

By default, AegisRunner spreads requests across multiple LLM providers in parallel (Z.AI GLM, DeepSeek, MiniMax, MiMo Flash, OpenRouter Devstral, Chutes). Each provider gets its own circuit breaker. If one provider goes down or starts returning low-quality output, traffic shifts to the others without any visible interruption.

Re-generating tests for a page

If a page changed substantially and you want fresh tests:

  1. Open the scan result and drill into the page.
  2. Click Generate AI tests on the page detail panel.
  3. The new run replaces the old suite for that page.

Or re-scan the project — every scan re-generates tests, but only for the pages that come back in that scan's results.

Reviewing AI tests

Don't just trust the AI tests blindly. The Test Baseline Review page lets you mark each test as approved, needs-edit, or rejected before you start running them in CI. See Test Baseline Review.

Limitations

  • JS-heavy single-page apps with hydration timing issues can still produce flaky tests on the first run. Use Baseline Replays for deterministic CI runs.
  • Pages behind a login need a working login script under Project Config; otherwise the scanner only sees the login screen.
  • OAuth flows generate tests up to the redirect, then can't continue past the third-party page. Use mock SSO in staging or use pre-auth cookies.
  • CAPTCHA-protected pages won't generate meaningful tests. Add a bypass token to your test environment.

Related

Need help?

Can't find what you're looking for? Our support team is here to help.