API Documentation
Complete reference for the Data From URL REST API
POST
/api/extract
Extract structured data from a single URL
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Required | The URL to extract data from |
options.contentType |
string | Optional | Expected content type: 'article', 'product', 'general', or 'auto' |
options.preferBrowser |
boolean | Optional | Use browser rendering for JavaScript-heavy sites |
options.includeMetadata |
boolean | Optional | Include additional metadata in response |
options.timeout |
number | Optional | Request timeout in milliseconds (default: 8000) |
Example Request
curl -X POST http://localhost:4000/api/extract \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"options": {
"contentType": "article",
"includeMetadata": true
}
}'Example Response
{
"success": true,
"data": {
"title": "Example Article Title",
"description": "Article description...",
"content": "Full article text...",
"author": "John Doe",
"publishedDate": "2025-01-15",
"url": "https://example.com/article"
},
"metadata": {
"contentType": "article",
"confidence": 0.95,
"extractionTime": 1234
}
}
POST
/api/extract/batch
Extract data from multiple URLs at once (max 10)
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
urls |
string[] | Required | Array of URLs to extract (max 10) |
options |
object | Optional | Same options as single extraction |
Example Request
curl -X POST http://localhost:4000/api/extract/batch \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/page1",
"https://example.com/page2"
],
"options": {
"includeMetadata": true
}
}'
POST
/api/extract/async
Submit an async extraction job for long-running extractions
Example Request
curl -X POST http://localhost:4000/api/extract/async \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/slow-page"
}'Example Response
{
"success": true,
"jobId": "job_abc123",
"status": "pending"
}
GET
/api/jobs/{jobId}
Check the status of an async job
GET
/api/health
Check service health and available capabilities
Example Response
{
"status": "healthy",
"capabilities": {
"aiParsing": true,
"browserRendering": true,
"proxySupport": false
}
}