crawl4ai/examples at deploy - crawl4ai - Gitea: Git with a cup of tea

yujunjun/crawl4ai

mirror of https://github.com/unclecode/crawl4ai.git synced 2025-11-13 18:27:53 +00:00

History

UncleCode 9547bada3a feat(content): add target_elements parameter for selective content extraction

Adds new target_elements parameter to CrawlerRunConfig that allows more flexible content selection than css_selector. This enables focusing markdown generation and data extraction on specific elements while still processing the entire page for links and media.

Key changes:
- Added target_elements list parameter to CrawlerRunConfig
- Modified WebScrapingStrategy and LXMLWebScrapingStrategy to handle target_elements
- Updated documentation with examples and comparison between css_selector and target_elements
- Fixed table extraction in content_scraping_strategy.py

BREAKING CHANGE: Table extraction logic has been modified to better handle thead/tbody structures

2025-03-10 18:54:51 +08:00

..

- User agent

2024-06-08 17:59:42 +08:00

feat(cli): add command line interface with comprehensive features

2025-02-10 16:58:52 +08:00

amazon_product_extraction_direct_url.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

amazon_product_extraction_using_hooks.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

amazon_product_extraction_using_use_javascript.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

async_webcrawler_multiple_urls_example.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

browser_optimization_example.py

Release prep (#749 )

2025-02-28 19:53:35 +08:00

chainlit.md

Add research assistant example using Chainlit

2024-06-04 22:43:09 +08:00

crawlai_vs_firecrawl.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

crypto_analysis_example.py

feat(scraping): add smart table extraction and analysis capabilities

2025-03-09 21:31:33 +08:00

deepcrawl_example.py

fix(docs): correct section numbering in deepcrawl_example.py tutorial

2025-03-04 20:57:33 +08:00

dispatcher_example.py

feat(content): add target_elements parameter for selective content extraction

2025-03-10 18:54:51 +08:00

docker_config_obj.py

refactor(config): enhance serialization and config handling

2025-02-19 17:23:25 +08:00

docker_example.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

docker_python_rest_api.py

refactor(config): enhance serialization and config handling

2025-02-19 17:23:25 +08:00

docker_python_sdk.py

refactor(config): enhance serialization and config handling

2025-02-19 17:23:25 +08:00

extraction_strategies_examples.py

feat(browser): add standalone CDP browser launch and lxml extraction strategy

2025-03-07 20:55:56 +08:00

full_page_screenshot_and_pdf_export.md

Enhance crawler capabilities and documentation

2024-12-25 21:34:31 +08:00

hello_world.py

feat(browser): add BrowserProfiler class for identity-based browsing

2025-03-02 20:32:29 +08:00

hooks_example.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

identity_based_browsing.py

feat(profiles): add CLI command for crawling with browser profiles

2025-03-02 21:33:33 +08:00

language_support_example.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

llm_extraction_openai_pricing.py

feat(browser): add standalone CDP browser launch and lxml extraction strategy

2025-03-07 20:55:56 +08:00

llm_markdown_generator.py

feat(browser): add standalone CDP browser launch and lxml extraction strategy

2025-03-07 20:55:56 +08:00

proxy_rotation_demo.py

feat(proxy): add proxy rotation strategy

2025-02-09 18:49:10 +08:00

quickstart_async.config.py

feat(browser): add standalone CDP browser launch and lxml extraction strategy

2025-03-07 20:55:56 +08:00

quickstart_async.py

feat(browser): add standalone CDP browser launch and lxml extraction strategy

2025-03-07 20:55:56 +08:00

quickstart_sync.py

feat(browser): add standalone CDP browser launch and lxml extraction strategy

2025-03-07 20:55:56 +08:00

quickstart_v0.ipynb

docs(urls): update documentation URLs to new domain

2025-01-09 16:22:41 +08:00

quickstart.ipynb

Release prep (#749 )

2025-02-28 19:53:35 +08:00

research_assistant.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

rest_call.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

sample_ecommerce.html

Push async version last changes for merge to main branch

2024-09-24 20:52:08 +08:00

scraping_strategies_performance.py

docs(examples): update demo scripts and fix output formats

2025-01-22 20:40:03 +08:00

serp_api_project_11_feb.py

Release prep (#749 )

2025-02-28 19:53:35 +08:00

ssl_example.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

storage_state_tutorial.md

Implement new async crawler features and stability updates

2024-12-10 17:55:29 +08:00

summarize_page.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

tutorial_dynamic_clicks.md

Add PDF & screenshot functionality, new tutorial

2024-12-10 20:10:39 +08:00

tutorial_v0.5.py

refactor(proxy): consolidate proxy configuration handling

2025-03-07 23:14:11 +08:00