
Major changes: - Add browser takeover feature using CDP for authentic browsing - Implement Docker support with full API server documentation - Enhance Mockdown with tag preservation system - Improve parallel crawling performance This release focuses on authenticity and scalability, introducing the ability to use users' own browsers while providing containerized deployment options. Breaking changes include modified browser handling and API response structure. See CHANGELOG.md for detailed migration guide.
4.7 KiB
4.7 KiB
Parameter Reference Table
File Name | Parameter Name | Code Usage | Strategy/Class | Description |
---|---|---|---|---|
async_crawler_strategy.py | user_agent | kwargs.get("user_agent") |
AsyncPlaywrightCrawlerStrategy | User agent string for browser identification |
async_crawler_strategy.py | proxy | kwargs.get("proxy") |
AsyncPlaywrightCrawlerStrategy | Proxy server configuration for network requests |
async_crawler_strategy.py | proxy_config | kwargs.get("proxy_config") |
AsyncPlaywrightCrawlerStrategy | Detailed proxy configuration including auth |
async_crawler_strategy.py | headless | kwargs.get("headless", True) |
AsyncPlaywrightCrawlerStrategy | Whether to run browser in headless mode |
async_crawler_strategy.py | browser_type | kwargs.get("browser_type", "chromium") |
AsyncPlaywrightCrawlerStrategy | Type of browser to use (chromium/firefox/webkit) |
async_crawler_strategy.py | headers | kwargs.get("headers", {}) |
AsyncPlaywrightCrawlerStrategy | Custom HTTP headers for requests |
async_crawler_strategy.py | verbose | kwargs.get("verbose", False) |
AsyncPlaywrightCrawlerStrategy | Enable detailed logging output |
async_crawler_strategy.py | sleep_on_close | kwargs.get("sleep_on_close", False) |
AsyncPlaywrightCrawlerStrategy | Add delay before closing browser |
async_crawler_strategy.py | use_managed_browser | kwargs.get("use_managed_browser", False) |
AsyncPlaywrightCrawlerStrategy | Use managed browser instance |
async_crawler_strategy.py | user_data_dir | kwargs.get("user_data_dir", None) |
AsyncPlaywrightCrawlerStrategy | Custom directory for browser profile data |
async_crawler_strategy.py | session_id | kwargs.get("session_id") |
AsyncPlaywrightCrawlerStrategy | Unique identifier for browser session |
async_crawler_strategy.py | override_navigator | kwargs.get("override_navigator", False) |
AsyncPlaywrightCrawlerStrategy | Override browser navigator properties |
async_crawler_strategy.py | simulate_user | kwargs.get("simulate_user", False) |
AsyncPlaywrightCrawlerStrategy | Simulate human-like behavior |
async_crawler_strategy.py | magic | kwargs.get("magic", False) |
AsyncPlaywrightCrawlerStrategy | Enable advanced anti-detection features |
async_crawler_strategy.py | log_console | kwargs.get("log_console", False) |
AsyncPlaywrightCrawlerStrategy | Log browser console messages |
async_crawler_strategy.py | js_only | kwargs.get("js_only", False) |
AsyncPlaywrightCrawlerStrategy | Only execute JavaScript without page load |
async_crawler_strategy.py | page_timeout | kwargs.get("page_timeout", 60000) |
AsyncPlaywrightCrawlerStrategy | Timeout for page load in milliseconds |
async_crawler_strategy.py | ignore_body_visibility | kwargs.get("ignore_body_visibility", True) |
AsyncPlaywrightCrawlerStrategy | Process page even if body is hidden |
async_crawler_strategy.py | js_code | kwargs.get("js_code", kwargs.get("js", self.js_code)) |
AsyncPlaywrightCrawlerStrategy | Custom JavaScript code to execute |
async_crawler_strategy.py | wait_for | kwargs.get("wait_for") |
AsyncPlaywrightCrawlerStrategy | Wait for specific element/condition |
async_crawler_strategy.py | process_iframes | kwargs.get("process_iframes", False) |
AsyncPlaywrightCrawlerStrategy | Extract content from iframes |
async_crawler_strategy.py | delay_before_return_html | kwargs.get("delay_before_return_html") |
AsyncPlaywrightCrawlerStrategy | Additional delay before returning HTML |
async_crawler_strategy.py | remove_overlay_elements | kwargs.get("remove_overlay_elements", False) |
AsyncPlaywrightCrawlerStrategy | Remove pop-ups and overlay elements |
async_crawler_strategy.py | screenshot | kwargs.get("screenshot") |
AsyncPlaywrightCrawlerStrategy | Take page screenshot |
async_crawler_strategy.py | screenshot_wait_for | kwargs.get("screenshot_wait_for") |
AsyncPlaywrightCrawlerStrategy | Wait before taking screenshot |
async_crawler_strategy.py | semaphore_count | kwargs.get("semaphore_count", 5) |
AsyncPlaywrightCrawlerStrategy | Concurrent request limit |
async_webcrawler.py | verbose | kwargs.get("verbose", False) |
AsyncWebCrawler | Enable detailed logging |
async_webcrawler.py | warmup | kwargs.get("warmup", True) |
AsyncWebCrawler | Initialize crawler with warmup request |
async_webcrawler.py | session_id | kwargs.get("session_id", None) |
AsyncWebCrawler | Session identifier for browser reuse |
async_webcrawler.py | only_text | kwargs.get("only_text", False) |
AsyncWebCrawler | Extract only text content |
async_webcrawler.py | bypass_cache | kwargs.get("bypass_cache", False) |
AsyncWebCrawler | Skip cache and force fresh crawl |