Skip to main content

SDK Reference

QuickCrawlClient

QuickCrawlClient(api_url: str | None = None, api_key: str | None = None)

Create a client. Set api_url for HTTP mode, leave empty for CLI mode.


client.scrape(url, **kwargs)

Scrape a single URL and return its content in various formats.

Parameters:

ParameterTypeDefaultDescription
urlstrrequiredURL to scrape
formatslist[str]["markdown"]Output formats: markdown, html, rawHtml, plainText, links, imageLinks, json
render_modestr"auto"Render mode: auto, http, browser
wait_forintNoneMilliseconds to wait after page load (0-120000)
include_tagslist[str]NoneCSS selectors to include
exclude_tagslist[str]NoneCSS selectors to exclude
css_selectorstrNoneExtract content from a specific CSS selector
ttlintNoneCache TTL in seconds (0=bypass cache)

Returns: dict with keys markdown, html, metadata, links, etc.


client.crawl(url, max_depth=2, max_pages=10, **kwargs)

Crawl a website using BFS. In HTTP mode this is async — it starts a crawl job and polls for completion.

Parameters:

ParameterTypeDefaultDescription
urlstrrequiredStarting URL
max_depthint2Maximum link depth (0-100)
max_pagesint10Maximum pages to scrape (1-100)
poll_intervalfloat2.0Seconds between status checks (HTTP mode only)
timeoutfloat300.0Maximum seconds to wait (HTTP mode only)
render_modestr"auto"Render mode: auto, http, browser
wait_forintNoneMilliseconds to wait after each page load
formatslist[str]NoneOutput formats per page (note: json not supported)

Returns: list[dict] — each dict contains scraped page data.


client.map(url, max_depth=2, use_sitemap=True, timeout=30000)

Discover URLs on a website without scraping content.

Parameters:

ParameterTypeDefaultDescription
urlstrrequiredStarting URL
max_depthint2Maximum link depth (0-50)
use_sitemapboolTrueUse sitemap.xml as seed URLs
timeoutint30000Timeout in milliseconds (1-600000)

Returns: list[str] — discovered URLs.


client.search(query, **kwargs)

Search SearXNG and optionally scrape content from each result.

Parameters:

ParameterTypeDefaultDescription
querystrrequiredSearch query
scrapeboolFalseScrape each result URL
formatslist[str]["markdown"]Output formats when scraping
render_modestr"auto"Render mode for scraping results
regionstr"us-en"Region code (e.g., us-en, gb-en)
pageint11-based page number (1-1000)
time_rangestrNoneTime filter: day, week, month, year
use_bm25boolFalseRe-rank results with BM25 scoring

Returns:

{
"query": str,
"results": [
{
"position": int,
"score": float,
"bm25_score": float | None, # only if use_bm25=True
"title": str,
"url": str,
"site_name": str,
"snippet": str,
"markdown": str | None, # only if scrape=True
"html": str | None, # only if scrape=True
}
],
"total_results": int,
"page": int,
}

Context Manager

with QuickCrawlClient() as client:
result = client.scrape("https://example.com")
# close() is called automatically

Environment Variables

VariableDescription
QUICKCRAWL_BINARYPath to the quickcrawl CLI binary
QUICKCRAWL_API_URLAPI URL for HTTP mode
QUICKCRAWL_API_KEYAPI key for HTTP mode

Exceptions

ExceptionDescription
QuickCrawlErrorBase exception for all SDK errors
QuickCrawlBinaryNotFoundErrorBinary could not be found or downloaded
QuickCrawlTimeoutErrorOperation timed out
QuickCrawlApiErrorAPI returned an error (includes status_code)