6 Commits

Author SHA1 Message Date
UncleCode
4bcd4cbda1 refactor(pdf): improve PDF processor dependency handling
Make PyPDF2 an optional dependency and improve import handling in PDF processor.
Move imports inside methods to allow for lazy loading and better error handling.
Add new 'pdf' optional dependency group in pyproject.toml.
Clean up unused imports and remove deprecated files.

BREAKING CHANGE: PyPDF2 is now an optional dependency. Users need to install with 'pip install crawl4ai[pdf]' to use PDF processing features.
2025-02-25 22:27:55 +08:00
UncleCode
c6d48080a4 feat(logger): add abstract logger base class and file logger implementation
Add AsyncLoggerBase abstract class to standardize logger interface and introduce AsyncFileLogger for file-only logging. Remove deprecated always_bypass_cache parameter and clean up AsyncWebCrawler initialization.

BREAKING CHANGE: Removed deprecated 'always_by_pass_cache' parameter. Use BrowserConfig cache settings instead.
2025-02-23 21:23:41 +08:00
UncleCode
8ec12d7d68 Apply Ruff Corrections 2025-01-13 19:19:58 +08:00
UncleCode
24b3da717a refactor():
- Update hello world example
2025-01-02 17:53:30 +08:00
UncleCode
98acc4254d refactor:
- Update hello_world.py example
2025-01-01 19:47:22 +08:00
UncleCode
aa4f92f458 refactor(crawler):
- Update hello_world example with proper content filtering
2025-01-01 19:39:42 +08:00