crawl4ai/docs/md_v2/advanced/proxy-security.md

59 lines
1.6 KiB
Markdown
Raw Normal View History

# Proxy
2024-10-27 19:24:46 +08:00
## Basic Proxy Setup
Simple proxy configuration with `BrowserConfig`:
2024-10-27 19:24:46 +08:00
```python
from crawl4ai.async_configs import BrowserConfig
2024-10-27 19:24:46 +08:00
# Using proxy URL
browser_config = BrowserConfig(proxy="http://proxy.example.com:8080")
async with AsyncWebCrawler(config=browser_config) as crawler:
2024-10-27 19:24:46 +08:00
result = await crawler.arun(url="https://example.com")
# Using SOCKS proxy
browser_config = BrowserConfig(proxy="socks5://proxy.example.com:1080")
async with AsyncWebCrawler(config=browser_config) as crawler:
2024-10-27 19:24:46 +08:00
result = await crawler.arun(url="https://example.com")
```
## Authenticated Proxy
Use an authenticated proxy with `BrowserConfig`:
2024-10-27 19:24:46 +08:00
```python
from crawl4ai.async_configs import BrowserConfig
2024-10-27 19:24:46 +08:00
proxy_config = {
"server": "http://proxy.example.com:8080",
"username": "user",
"password": "pass"
}
browser_config = BrowserConfig(proxy_config=proxy_config)
async with AsyncWebCrawler(config=browser_config) as crawler:
2024-10-27 19:24:46 +08:00
result = await crawler.arun(url="https://example.com")
```
## Rotating Proxies
Example using a proxy rotation service and updating `BrowserConfig` dynamically:
2024-10-27 19:24:46 +08:00
```python
from crawl4ai.async_configs import BrowserConfig
2024-10-27 19:24:46 +08:00
async def get_next_proxy():
# Your proxy rotation logic here
return {"server": "http://next.proxy.com:8080"}
browser_config = BrowserConfig()
async with AsyncWebCrawler(config=browser_config) as crawler:
2024-10-27 19:24:46 +08:00
# Update proxy for each request
for url in urls:
proxy = await get_next_proxy()
browser_config.proxy_config = proxy
result = await crawler.arun(url=url, config=browser_config)
2024-10-27 19:24:46 +08:00
```