Skip to main content

Architecture

QuickCrawl uses a layered architecture that combines HTTP fetching, headless browser automation, and LLM-powered structured extraction.

Layers

Client (HTTP / MCP / CLI)

Gin Router

API Handlers

┌────────┴────────┐
│ Renderer │
│ (HTTP / CDP) │
└────────┬────────┘

┌─────────────────┐
│ Extractor │
│ (Markdown, HTML,│
│ Links, JSON) │
└─────────────────┘

Components

ComponentFileResponsibility
HTTP Fetcherinternal/core/http.goPlain HTTP GET with stealth headers, retries
CDP Browserinternal/core/renderer.goHeadless Chrome via chromedp
Extractorinternal/extractor/HTML → Markdown, Plain Text, Links
Crawlerinternal/crawler/crawl.goBFS site crawling with robots.txt
Sitemapinternal/crawler/map.goURL discovery via sitemap.xml
Searchinternal/search/SearXNG + optional result scraping
LLM Extractioninternal/core/llm.goJSON schema-based extraction

Render Modes

Every fetch goes through one of three render strategies:

ModeDescription
httpPlain HTTP GET, no JavaScript
browserHeadless Chrome via CDP (full JS rendering)
autoHTTP first — escalate to browser when needed

Sections

  • Scrape — Single URL fetching with HTTP/browser/auto modes
  • Crawl — Async BFS website crawling
  • Map — URL discovery without content extraction
  • Search — SearXNG search with optional result scraping