crawl4ai

mirror of https://github.com/unclecode/crawl4ai.git synced 2025-12-30 11:55:18 +00:00

History

UncleCode d09c611d15 feat(robots): add robots.txt compliance support

Add support for checking and respecting robots.txt rules before crawling websites:
- Implement RobotsParser class with SQLite caching
- Add check_robots_txt parameter to CrawlerRunConfig
- Integrate robots.txt checking in AsyncWebCrawler
- Update documentation with robots.txt compliance examples
- Add tests for robot parser functionality

The cache uses WAL mode for better concurrency and has a default TTL of 7 days.

2025-01-21 17:54:13 +08:00

arun_many.md

docs(api): add streaming mode documentation and examples

2025-01-19 18:21:34 +08:00

arun.md

feat(robots): add robots.txt compliance support

2025-01-21 17:54:13 +08:00

async-webcrawler.md

refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

2025-01-11 21:10:27 +08:00

crawl-result.md

refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring