Documentation
Crawling

Starting a New Crawl

Configure and start a new scan. Modes, page scope, device emulation, fill/submit forms, audits, plan limits, baselines, and CI triggering.

Starting a New Scan

Scanning is how AegisRunner learns your site. It opens the URL in a real browser, follows links, fills forms, and records what it finds. Everything else — test generation, accessibility checks, screenshots, the AI coverage report — flows from this step.

Heads up on terminology: the sidebar still says Scan in newer copy and Crawl in older screens. They're the same thing.

Where to start a scan

  • From the sidebar — click Scan. Pick the project you want to scan, click New Scan, and the configuration modal opens.
  • From a project — open the project, click New Scan on the project header.
  • Automatically on project creation — when you create a new project, the "Scan immediately" option is checked by default and the first scan starts as soon as the project is saved. See Managing Projects.
  • From CI — POST to /api/v1/ci/trigger with a CI trigger token. See CI/CD Integration.

Scan modes

🌐
Full Site
Starts at your project's base URL and follows links until it runs out of pages or hits your plan cap. Best for first-time setup.
📄
Single Page
Scans exactly one URL. Useful when you've changed one screen and want to refresh just its tests.
🔁
Baseline Replay
Replays a saved baseline — same pages, same interactions, every time. Appears once you've set a baseline. The deterministic option for CI.
Baseline Replay only shows up after you've marked a previous Full Site scan as the baseline. Run a Full Site scan, open the result, and click Set as Baseline. See Baseline Replays.

What's in the scan modal

Page scope

OptionWhat it does
Max pagesCaps how many pages the scan walks. Defaults to a conservative 10, raised to your plan cap on first paid scan. See plan limits below.
Link depthHow many clicks deep the scanner follows links. 1 = home page only. 2 = home → linked page. Default 5.
Page path (single page mode)The path to scan, relative to your base URL — e.g. /checkout.
Include / Exclude patternsURL patterns to focus on or skip. Useful for excluding /admin, /logout, or anything destructive. Open the URL Patterns drawer to set these.

Device

Pick a device profile to run the scan as a phone, tablet, or different desktop size. Tests generated from a mobile scan automatically use mobile viewport when they run.

ProfileViewport
Desktop (default)1920 × 1080
iPhone 15 Pro Max430 × 739
iPhone 15 Pro393 × 659
Galaxy S24360 × 780
Pixel 7412 × 839
iPad Pro 11"834 × 1194
iPad Mini768 × 1024
Device emulation is a paid-plan feature. Free-plan scans always use desktop.

Basic features

  • Fill forms — the scanner fills inputs with safe test data so it can see what's behind them. By default it skips login/register forms; check Fill login/register forms to override.
  • Submit forms — actually submits filled forms. Use this when you want to discover post-submission pages (success screens, error states).
  • Accessibility — runs axe-core WCAG checks on every discovered page.

Pro features

  • Accessibility snapshots — saves how each page reads to a screen reader (the ARIA tree). Used by auto-heal and the Playwright agent for resilient locators.
  • Capture JS errors — grabs a screenshot every time the browser logs a JavaScript error during the scan.

Business features

  • Reduced motion — scans with prefers-reduced-motion on, catching motion-sensitivity bugs.
  • High contrast mode — scans with forced-colors on (Windows high-contrast).
  • Memory leaks — runs a longer session and watches for unbounded memory growth.
  • Live connections — monitors WebSocket traffic during the scan.

Plan limits

PlanMax pages per scanScans per monthConcurrent scansMobile / Pro / Business features
Free10151
Starter751502Mobile devices
Pro5005003+ Accessibility snapshots, JS errors
BusinessUnlimited2,0005+ Reduced motion, high contrast, memory, WebSockets
EnterpriseUnlimitedUnlimitedUnlimitedEverything + SSO/SAML

If a scan would exceed your cap, it's silently truncated to your plan's limit — you'll see a banner on the result page. Upgrade to raise the cap.

What happens during the scan

  1. Sitemap + link discovery — the scanner reads sitemap.xml if present, then walks links from your start URL.
  2. Interaction discovery — on each page, it clicks buttons, opens dropdowns, fills forms (if enabled) to surface hidden states.
  3. Element capture — for each interactive element, it records a stable identifier (accessibility role + visible text where possible, falling back to attribute-based selectors).
  4. Audits — accessibility (axe-core), SEO basics, security headers, and performance are checked per page.
  5. Test generation — once the scan finishes, the AI writes a test suite per page in the background. The AI coverage banner on the result page shows how many of the discovered pages got tests.

Behind the scenes

Larger scans use multiple browser contexts running in parallel — typically two to four — so a 200-page scan finishes in roughly the time of a 50-page sequential scan. Pages that change too quickly between visits are pinned to a single worker for consistency.

Setting a baseline

  1. Run a Full Site scan you're happy with — finished pages, fair coverage, no obvious flakes.
  2. On the scan result page, click Set as Baseline.
  3. AegisRunner compiles a baseline manifest — the exact list of pages, the exact interactions, and the expected screenshots.
  4. Baseline Replay mode now appears in the scan modes. It will replay this manifest on every CI run.

See Baseline Replays for how the replay handles drift, what counts as a regression, and how to update the baseline.

Triggering scans from CI

curl -X POST https://aegisrunner.com/api/v1/ci/trigger \
  -H "Authorization: Bearer aegis_..." \
  -d '{"crawl": true, "maxPages": 100}'

CI scans inherit the last UI scan's settings unless you override them in the request body. See CI/CD Integration for the full payload.

Best practices

  • Start small. First scan with the default 10 pages, see the result, then raise Max pages on the next one.
  • Exclude destructive paths. Add /admin, /logout, anything with delete or destroy to the exclude list.
  • Don't submit forms on production unless you've pointed the scanner at safe test data — submitting can create real records.
  • Use Baseline Replay in CI. Full-site scans aren't deterministic; baseline replays are. Run full scans weekly to refresh the baseline, replay on every PR.
  • Pre-load auth. If your site sits behind login, set up a login script in project settings so the scan can reach pages that need a session.

Related

Need help?

Can't find what you're looking for? Our support team is here to help.