Skip to main content

Crawl API

Start and manage asynchronous BFS crawls of entire websites.

MethodPathDescription
POST/v1/crawlStart an async BFS crawl
GET/v1/crawl/:idCheck crawl status and retrieve results
DELETE/v1/crawl/:idCancel a running crawl

Start a Crawl — POST /v1/crawl

Request Body

FieldTypeRequiredDescription
urlstringyesStarting URL
maxDepthintnoMaximum link depth (0-100). Default from config
maxPagesintnoMaximum pages to scrape (1-100). Default from config
formatsstring[]noOutput formats per page. "json" rejected (use /v1/scrape)
renderModestringno"auto", "browser", "http"
waitForintnoMilliseconds to wait after each navigation

Example Request

{
"url": "https://example.com",
"maxDepth": 2,
"maxPages": 50,
"formats": ["markdown", "links"],
"renderMode": "http"
}

Response

{
"success": true,
"id": "crawl-1748899200000000000"
}

Check Status — GET /v1/crawl/:id

Response (running)

{
"id": "crawl-1748899200000000000",
"success": true,
"status": "scraping",
"total": 47,
"completed": 12,
"data": []
}

Response (completed)

{
"id": "crawl-1748899200000000000",
"success": true,
"status": "completed",
"total": 47,
"completed": 47,
"data": [
{
"markdown": "# Example Domain\n\n...",
"links": ["https://www.iana.org/domains/example"],
"metadata": {
"sourceURL": "https://example.com",
"statusCode": 200,
"renderedMode": "http",
"timeTaken": 281
}
}
]
}

status is one of: pending, scraping, completed, failed.

Cancel a Crawl — DELETE /v1/crawl/:id

Returns 204 No Content on success, 404 Not Found if the job ID is unknown.

Error Responses

StatusCodeCause
400invalid_requestMissing url, invalid params, formats contains "json"
500internal_errorScraper not initialized