[ PROMPT_NODE_27843 ]
Bright Data Best Practices
[ SKILL_DOCUMENTATION ]
# Bright Data APIs
Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.
## Choosing the Right API
| Use Case | API | Why |
|----------|-----|-----|
| Scrape any webpage by URL (no interaction) | Web Unlocker | HTTP-based, auto-bypasses bot detection, cheapest |
| Google / Bing / Yandex search results | SERP API | Specialized for SERP extraction, returns structured data |
| Structured data from Amazon, LinkedIn, Instagram, TikTok, etc. | Web Scraper API | Pre-built scrapers, no parsing needed |
| Click, scroll, fill forms, run JS, intercept XHR | Browser API | Full browser automation |
| Puppeteer / Playwright / Selenium automation | Browser API | Connects via CDP/WebDriver |
## Authentication Pattern (All APIs)
All APIs share the same authentication model:
```bash
export BRIGHTDATA_API_KEY="your-api-key" # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name" # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name" # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD" # Browser API credentials
```
REST API authentication header for Web Unlocker and SERP API:
```
Authorization: Bearer YOUR_API_KEY
```
---
## Web Unlocker API
HTTP-based scraping proxy. Best for simple page fetches without browser interaction.
**Endpoint:** `POST https://api.brightdata.com/request`
```python
import requests
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_ZONE_NAME",
"url": "https://example.com/product/123",
"format": "raw"
}
)
html = response.text
```
### Key Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `zone` | string | Zone name (required) |
| `url` | string | Target URL with `http://` or `https://` (required) |
| `format` | string | `"raw"` (HTML) or `"json"` (structured wrapper) (required) |
| `method` | string | HTTP verb, default `"GET"` |
| `country` | string | 2-letter ISO for geo-targeting (e.g., `"us"`, `"de"`) |
| `data_format` | string | Transform: `"markdown"` or `"screenshot"` |
| `async` | boolean | `true` for async mode |
### Quick Patterns
```python
# Get markdown (best for LLM input)
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}
)
# Geo-targeted request
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}
# Screenshot for debugging
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}
# Async for bulk processing
json={"zone": ZONE, "url": url, "format": "raw", "async": True}
```
**Critical rule:** Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.
See **[references/web-unlocker.md](references/web-unlocker.md)** for complete reference including proxy interface, special headers, async flow, features, and billing.
---
## SERP API
Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.
**Endpoint:** `POST https://api.brightdata.com/request` (same as Web Unlocker)
```python
response = requests.post(
"https://api.brightdata.com/request",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"zone": "YOUR_SERP_ZONE",
"url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
"format": "raw"
}
)
data = response.json()
for result in data.get("organic", []):
print(result["rank"], result["title"], result["link"])
```
### Essential Google URL Parameters
| Parameter | Description | Example |
|-----------|-------------|---------|
| `q` | Search query | `q=python+web+scraping` |
| `brd_json` | Parsed JSON output | `brd_json=1` (always use for data pipelines) |
| `gl` | Country for search | `gl=us` |
| `hl` | Language | `hl=en` |
| `start` | Pagination offset | `start=10` (page 2), `start=20` (page 3) |
| `tbm` | Search type | `tbm=nws` (news), `tbm=isch` (images), `tbm=vid` (videos) |
| `brd_mobile` | Device | `brd_mobile=1` (mobile), `brd_mobile=ios` |
| `brd_browser` | Browser | `brd_browser=chrome` |
| `brd_ai_overview` | Trigger AI Overview | `brd_ai_overview=2` |
| `uule` | Encoded geo location | for precise location targeting |
**Note:** `num` parameter is **deprecated** as of September 2025. Use `start` for pagination.
### Parsed JSON Response Structure
```json
{
"organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
"paid": [],
"people_also_ask": [],
"knowledge_graph": {},
"related_searches": [],
"general": {"results_cnt": 1240000000, "query": "..."}
}
```
### Bing Key Parameters
| Parameter | Description |
|-----------|-------------|
| `q` | Search query |
| `setLang` | Language (prefer 4-letter: `en-US`) |
| `cc` | Country code |
| `first` | Pagination (increment by 10: 1, 11, 21...) |
| `safesearch` | `off`, `moderate`, `strict` |
| `brd_mobile` | Device type |
### Async for Bulk SERP
```python
# Submit
response = requests.post(
"https://api.brightdata.com/request",
params={"async": "1"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}
)
response_id = response.headers.get("x-response-id")
# Retrieve (retrieve calls are NOT billed)
result = requests.get(
"https://api.brightdata.com/serp/get_result",
params={"response_id": response_id},
headers={"Authorization": f"Bearer {API_KEY}"}
)
```
**Billing:** Pay per 1,000 successful requests only. Async retrieve calls are not billed.
See **[references/serp-api.md](references/serp-api.md)** for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.
---
## Web Scraper API
Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.
**Sync Endpoint:** `POST https://api.brightdata.com/datasets/v3/scrape`
**Async Endpoint:** `POST https://api.brightdata.com/datasets/v3/trigger`
```python
# Sync (up to 20 URLs, returns immediately)
response = requests.post(
"https://api.brightdata.com/datasets/v3/scrape",
params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}
)
if response.status_code == 200:
data = response.json() # Results ready
elif response.status_code == 202:
snapshot_id = response.json()["snapshot_id"] # Poll for completion
```
### Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `dataset_id` | string | Scraper identifier from the Scraper Library (required) |
| `format` | string | `json` (default), `ndjson`, `jsonl`, `csv` |
| `custom_output_fields` | string | Pipe-separated fields: `url|title|price` |
| `include_errors` | boolean | Include error info in results |
### Request Body
```json
{
"input": [
{ "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
{ "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
]
}
```
### Poll for Async Results
```python
import time
# Trigger
snapshot_id = requests.post(
"https://api.brightdata.com/datasets/v3/trigger",
params={"dataset_id": DATASET_ID, "format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": [{"url": u} for u in urls]}
).json()["snapshot_id"]
# Poll
while True:
status = requests.get(
f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",
headers={"Authorization": f"Bearer {API_KEY}"}
).json()["status"]
if status == "ready": break
if status == "failed": raise Exception("Job failed")
time.sleep(10)
# Download
data = requests.get(
f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",
params={"format": "json"},
headers={"Authorization": f"Bearer {API_KEY}"}
).json()
```
**Progress status values:** `starting` → `running` → `ready` | `failed`
**Data retention:** 30 days.
**Billing:** Per delivered record. Invalid input URLs that fail are still billable.
See **[references/web-scraper-api.md](references/web-scraper-api.md)** for complete reference including scraper types, output formats, delivery options, and billing details.
---
## Browser API (Scraping Browser)
Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.
**Connection:**
- Playwright/Puppeteer: `wss://${AUTH}@brd.superproxy.io:9222`
- Selenium: `https://${AUTH}@brd.superproxy.io:9515`
```javascript
const { chromium } = require("playwright-core");
const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes
await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();
```
```python
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
page = await browser.new_page()
page.set_default_navigation_timeout(120000)
await page.goto("https://example.com", wait_until="domcontentloaded")
html = await page.content()
await browser.close()
```
### Custom CDP Functions
| Function | Purpose |
|----------|---------|
| `Captcha.solve` | Manually trigger CAPTCHA solving |
| `Captcha.setAutoSolve` | Enable/disable auto CAPTCHA solving |
| `Proxy.setLocation` | Set precise geo location (call BEFORE goto) |
| `Proxy.useSession` | Maintain same IP across sessions |
| `Emulation.setDevice` | Apply device profile (iPhone 14, etc.) |
| `Emulation.getSupportedDevices` | List available device profiles |
| `Unblocker.enableAdBlock` | Block ads to save bandwidth |
| `Unblocker.disableAdBlock` | Re-enable ads |
| `Input.type` | Fast text input for bulk form filling |
| `Browser.addCertificate` | Install client SSL cert for session |
| `Page.inspect` | Get DevTools debug URL for live session |
```javascript
// CDP session pattern for custom functions
const client = await page.target().createCDPSession();
// CAPTCHA solve with timeout
const result = await client.send("Captcha.solve", { timeout: 30000 });
// Precise geo location (must be before goto)
await client.send("Proxy.setLocation", {
latitude: 37.7749,
longitude: -122.4194,
distance: 10,
strict: true
});
// Block unnecessary resources
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });
// Device emulation
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });
```
### Session Rules
- **One initial navigation per session** — new URL = new session
- **Idle timeout:** 5 minutes
- **Max duration:** 30 minutes
### Geolocation
- Country-level: append `-country-us` to credentials username
- EU-wide: append `-country-eu` (routes through 29+ European countries)
- Precise: use `Proxy.setLocation` CDP command (before navigation)
### Error Codes
| Code | Issue | Fix |
|------|-------|-----|
| `407` | Wrong port | Playwright/Puppeteer → `9222`, Selenium → `9515` |
| `403` | Bad auth | Check credentials format and zone type |
| `503` | Service scaling | Wait 1 minute, reconnect |
**Billing:** Traffic-based only. Block images/CSS/fonts to reduce costs.
See **[references/browser-api.md](references/browser-api.md)** for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.
---
## Detailed References
- **[references/web-unlocker.md](references/web-unlocker.md)** — Web Unlocker: full parameter list, proxy interface, special headers, async flow, features, billing, anti-patterns
- **[references/serp-api.md](references/serp-api.md)** — SERP API: all Google params (Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, parsed JSON structure, async, billing
- **[references/web-scraper-api.md](references/web-scraper-api.md)** — Web Scraper API: sync vs async, all parameters, polling, scraper types, output formats, billing
- **[references/browser-api.md](references/browser-api.md)** — Browser API: connection strings, session rules, all CDP functions, geo-targeting, bandwidth optimization, CAPTCHA, debugging, error codes
Source: claude-code-templates (MIT). See About Us for full credits.