UncleCode 67a23c3182 feat(core): Release v0.3.73 with Browser Takeover and Docker Support

Major changes:
- Add browser takeover feature using CDP for authentic browsing
- Implement Docker support with full API server documentation
- Enhance Mockdown with tag preservation system
- Improve parallel crawling performance

This release focuses on authenticity and scalability, introducing the ability
to use users' own browsers while providing containerized deployment options.
Breaking changes include modified browser handling and API response structure.

See CHANGELOG.md for detailed migration guide.

2024-11-05 20:04:18 +08:00

4.7 KiB

Raw Blame History

Parameter Reference Table

File Name	Parameter Name	Code Usage	Strategy/Class	Description
async_crawler_strategy.py	user_agent	`kwargs.get("user_agent")`	AsyncPlaywrightCrawlerStrategy	User agent string for browser identification
async_crawler_strategy.py	proxy	`kwargs.get("proxy")`	AsyncPlaywrightCrawlerStrategy	Proxy server configuration for network requests
async_crawler_strategy.py	proxy_config	`kwargs.get("proxy_config")`	AsyncPlaywrightCrawlerStrategy	Detailed proxy configuration including auth
async_crawler_strategy.py	headless	`kwargs.get("headless", True)`	AsyncPlaywrightCrawlerStrategy	Whether to run browser in headless mode
async_crawler_strategy.py	browser_type	`kwargs.get("browser_type", "chromium")`	AsyncPlaywrightCrawlerStrategy	Type of browser to use (chromium/firefox/webkit)
async_crawler_strategy.py	headers	`kwargs.get("headers", {})`	AsyncPlaywrightCrawlerStrategy	Custom HTTP headers for requests
async_crawler_strategy.py	verbose	`kwargs.get("verbose", False)`	AsyncPlaywrightCrawlerStrategy	Enable detailed logging output
async_crawler_strategy.py	sleep_on_close	`kwargs.get("sleep_on_close", False)`	AsyncPlaywrightCrawlerStrategy	Add delay before closing browser
async_crawler_strategy.py	use_managed_browser	`kwargs.get("use_managed_browser", False)`	AsyncPlaywrightCrawlerStrategy	Use managed browser instance
async_crawler_strategy.py	user_data_dir	`kwargs.get("user_data_dir", None)`	AsyncPlaywrightCrawlerStrategy	Custom directory for browser profile data
async_crawler_strategy.py	session_id	`kwargs.get("session_id")`	AsyncPlaywrightCrawlerStrategy	Unique identifier for browser session
async_crawler_strategy.py	override_navigator	`kwargs.get("override_navigator", False)`	AsyncPlaywrightCrawlerStrategy	Override browser navigator properties
async_crawler_strategy.py	simulate_user	`kwargs.get("simulate_user", False)`	AsyncPlaywrightCrawlerStrategy	Simulate human-like behavior
async_crawler_strategy.py	magic	`kwargs.get("magic", False)`	AsyncPlaywrightCrawlerStrategy	Enable advanced anti-detection features
async_crawler_strategy.py	log_console	`kwargs.get("log_console", False)`	AsyncPlaywrightCrawlerStrategy	Log browser console messages
async_crawler_strategy.py	js_only	`kwargs.get("js_only", False)`	AsyncPlaywrightCrawlerStrategy	Only execute JavaScript without page load
async_crawler_strategy.py	page_timeout	`kwargs.get("page_timeout", 60000)`	AsyncPlaywrightCrawlerStrategy	Timeout for page load in milliseconds
async_crawler_strategy.py	ignore_body_visibility	`kwargs.get("ignore_body_visibility", True)`	AsyncPlaywrightCrawlerStrategy	Process page even if body is hidden
async_crawler_strategy.py	js_code	`kwargs.get("js_code", kwargs.get("js", self.js_code))`	AsyncPlaywrightCrawlerStrategy	Custom JavaScript code to execute
async_crawler_strategy.py	wait_for	`kwargs.get("wait_for")`	AsyncPlaywrightCrawlerStrategy	Wait for specific element/condition
async_crawler_strategy.py	process_iframes	`kwargs.get("process_iframes", False)`	AsyncPlaywrightCrawlerStrategy	Extract content from iframes
async_crawler_strategy.py	delay_before_return_html	`kwargs.get("delay_before_return_html")`	AsyncPlaywrightCrawlerStrategy	Additional delay before returning HTML
async_crawler_strategy.py	remove_overlay_elements	`kwargs.get("remove_overlay_elements", False)`	AsyncPlaywrightCrawlerStrategy	Remove pop-ups and overlay elements
async_crawler_strategy.py	screenshot	`kwargs.get("screenshot")`	AsyncPlaywrightCrawlerStrategy	Take page screenshot
async_crawler_strategy.py	screenshot_wait_for	`kwargs.get("screenshot_wait_for")`	AsyncPlaywrightCrawlerStrategy	Wait before taking screenshot
async_crawler_strategy.py	semaphore_count	`kwargs.get("semaphore_count", 5)`	AsyncPlaywrightCrawlerStrategy	Concurrent request limit
async_webcrawler.py	verbose	`kwargs.get("verbose", False)`	AsyncWebCrawler	Enable detailed logging
async_webcrawler.py	warmup	`kwargs.get("warmup", True)`	AsyncWebCrawler	Initialize crawler with warmup request
async_webcrawler.py	session_id	`kwargs.get("session_id", None)`	AsyncWebCrawler	Session identifier for browser reuse
async_webcrawler.py	only_text	`kwargs.get("only_text", False)`	AsyncWebCrawler	Extract only text content
async_webcrawler.py	bypass_cache	`kwargs.get("bypass_cache", False)`	AsyncWebCrawler	Skip cache and force fresh crawl

4.7 KiB Raw Blame History

Parameter Reference Table

4.7 KiB

Raw Blame History