Documentation
Crawling

Starting a New Crawl

Learn how to configure and start a website crawl, including all available options for different subscription tiers.

Starting a New Crawl

Crawling discovers pages, forms, interactive elements, and captures data for AI test generation.

Crawl Modes

🌐
Full Site
Discovers all pages from your base URL. Explores interactions on each page. Best for comprehensive testing.
📄
Single Page
Analyzes one URL. Discovers all interactive elements and states. Ideal for focused testing.
🔁
Regression
Replays a baseline manifest — exact same pages, exact same interactions, every time. Deterministic and repeatable. Requires a baseline to be set first.
Regression mode appears when your project has a baseline. Complete a Full Site crawl and click Set as Baseline on the results page.

Crawl Settings

Basic

SettingDefaultDescription
Max Pages200Maximum pages to discover
Max Depth5Link depth from start URL
DeviceDesktop HDDevice profile for viewport, user agent, and touch emulation. See device options below.

Advanced

SettingDefaultDescription
Respect Robots.txtOffSkip disallowed URLs, honor Crawl-Delay
Fill FormsOffAuto-fill forms with test data during crawl
Skip Auth FormsOnAvoid submitting login/register forms
Include/Exclude PatternsNoneRegex patterns to focus or skip URL paths

Device Profiles

Choose a device profile to crawl your site as it appears on different devices. The crawler emulates the viewport, user agent, device scale factor, and touch capabilities.

ProfileViewportType
Desktop HD (default)1920 x 1080Desktop
Desktop 4K3840 x 2160Desktop
iPhone 12390 x 844Mobile + Touch
iPhone 14390 x 844Mobile + Touch
iPhone 14 Pro Max430 x 932Mobile + Touch
Pixel 7412 x 915Mobile + Touch
Pixel 7 Pro412 x 892Mobile + Touch
Samsung Galaxy S23360 x 780Mobile + Touch
iPad Pro 12.9"1024 x 1366Tablet + Touch
iPad Mini768 x 1024Tablet + Touch
Galaxy Tab S8800 x 1280Tablet + Touch
Mobile crawls discover mobile-specific elements (hamburger menus, bottom navigation, mobile modals) and generate mobile-specific tests. AI-generated tests from mobile crawls automatically use the mobile viewport.

How Crawling Works

Exploration (Full Site)

  1. Page Discovery — Sitemap parsing + link extraction from start URL
  2. State Discovery — DFS interaction: clicks buttons, opens dropdowns, fills forms to discover UI states
  3. DOM Extraction — Stable CSS selectors (ID, name, aria-label, data-testid) for forms, buttons, links
  4. Audits — Accessibility (axe-core WCAG), SEO, security headers, performance per page
  5. AI Test Generation — After crawl completes, AI generates test suites per page

Regression

  1. Visits manifest pages in recorded order
  2. Executes recorded interactions (click, select, fill)
  3. Captures screenshots for comparison
  4. Reports matched vs missing pages/interactions
  5. Single worker — no parallelism for maximum determinism

Setting a Baseline

  1. Run a Full Site crawl
  2. Click Set as Baseline on results page
  3. Manifest is compiled (pages, interactions, expected states)
  4. Regression mode becomes available

See Regression Manifests for details.

CI/CD Integration

curl -X POST /api/v1/ci/trigger -H "Authorization: Bearer aegis_..." -d '{"crawl": true, "maxPages": 100}'

CI crawls inherit settings from last UI crawl. See CI/CD Integration.

Best Practices

  • Start small, then increase max pages
  • Exclude admin areas, logout links, delete actions
  • Enable Respect Robots.txt for production sites
  • Enable Fill Forms for comprehensive form test coverage
  • Set a baseline after a good crawl, use regression for CI

Related Docs

Need help?

Can't find what you're looking for? Our support team is here to help.