crawl4ai

mirror of https://github.com/unclecode/crawl4ai.git synced 2025-12-13 11:30:49 +00:00

History

UncleCode d09c611d15 feat(robots): add robots.txt compliance support

Add support for checking and respecting robots.txt rules before crawling websites:
- Implement RobotsParser class with SQLite caching
- Add check_robots_txt parameter to CrawlerRunConfig
- Integrate robots.txt checking in AsyncWebCrawler
- Update documentation with robots.txt compliance examples
- Add tests for robot parser functionality

The cache uses WAL mode for better concurrency and has a default TTL of 7 days.

2025-01-21 17:54:13 +08:00

20241401

feat(robots): add robots.txt compliance support

2025-01-21 17:54:13 +08:00

async

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

__init__.py

- Test all methods

2024-05-14 21:27:41 +08:00

docker_example.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_cli_docs.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_docker.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_llmtxt.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_main.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_scraping_strategy.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_web_crawler.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00