Implements a responsive hamburger menu for mobile devices with the following changes:
- Add new mobile_menu.js for handling mobile navigation
- Update layout.css with mobile-specific styles and animations
- Enhance README with updated geolocation example
- Register mobile_menu.js in mkdocs.yml
The mobile menu includes:
- Hamburger button animation
- Slide-out sidebar
- Backdrop overlay
- Touch-friendly navigation
- Proper event handling
- Update Docker base image to Python 3.12-slim-bookworm
- Bump version from 0.6.0rc1 to 0.6.0
- Update documentation to reflect release version changes
- Fix license specification in pyproject.toml and setup.py
- Clean up code formatting in demo_docker_api.py
BREAKING CHANGE: Base Python version upgraded from 3.10 to 3.12
Rename LlmConfig to LLMConfig across the codebase to follow consistent naming conventions.
Update all imports and usages to use the new name.
Update documentation and examples to reflect the change.
BREAKING CHANGE: LlmConfig has been renamed to LLMConfig. Users need to update their imports and usage.
* fix: Update export of URLPatternFilter
* chore: Add dependancy for cchardet in requirements
* docs: Update example for deep crawl in release note for v0.5
* Docs: update the example for memory dispatcher
* docs: updated example for crawl strategies
* Refactor: Removed wrapping in if __name__==main block since this is a markdown file.
* chore: removed cchardet from dependancy list, since unclecode is planning to remove it
* docs: updated the example for proxy rotation to a working example
* feat: Introduced ProxyConfig param
* Add tutorial for deep crawl & update contributor list for bug fixes in feb alpha-1
* chore: update and test new dependancies
* feat:Make PyPDF2 a conditional dependancy
* updated tutorial and release note for v0.5
* docs: update docs for deep crawl, and fix a typo in docker-deployment markdown filename
* refactor: 1. Deprecate markdown_v2 2. Make markdown backward compatible to behave as a string when needed. 3. Fix LlmConfig usage in cli 4. Deprecate markdown_v2 in cli 5. Update AsyncWebCrawler for changes in CrawlResult
* fix: Bug in serialisation of markdown in acache_url
* Refactor: Added deprecation errors for fit_html and fit_markdown directly on markdown. Now access them via markdown
* fix: remove deprecated markdown_v2 from docker
* Refactor: remove deprecated fit_markdown and fit_html from result
* refactor: fix cache retrieval for markdown as a string
* chore: update all docs, examples and tests with deprecation announcements for markdown_v2, fit_html, fit_markdown
This major release adds deep crawling capabilities, memory-adaptive dispatcher,
multiple crawling strategies, Docker deployment, and a new CLI. It also includes
significant improvements to proxy handling, PDF processing, and LLM integration.
BREAKING CHANGES:
- Add memory-adaptive dispatcher as default for arun_many()
- Move max_depth to CrawlerRunConfig
- Replace ScrapingMode enum with strategy pattern
- Update BrowserContext API
- Make model fields optional with defaults
- Remove content_filter parameter from CrawlerRunConfig
- Remove synchronous WebCrawler and old CLI
- Update Docker deployment configuration
- Replace FastFilterChain with FilterChain
- Change license to Apache 2.0 with attribution clause
* feature: Add LlmConfig to easily configure and pass LLM configs to different strategies
* pulled in next branch and resolved conflicts
* feat: Add gemini and deepseek providers. Make ignore_cache in llm content filter to true by default to avoid confusions
* Refactor: Update LlmConfig in LLMExtractionStrategy class and deprecate old params
* updated tests, docs and readme
Resolves merge conflict in README.md by removing outdated version 0.4.24x information and keeping current version 0.4.3bx details. Updates release notes description to reflect current features including Memory Dispatcher System, Streaming Support, and other improvements.
No breaking changes.
Update version numbers to v0.4.3bx throughout README.md
Fix contributing guidelines link to point to CONTRIBUTORS.md
Update Aravind's role in CONTRIBUTORS.md to Head of Community and Product
Add pre-release installation instructions
Fix minor formatting in personal story section
No breaking changes
Renames the final_url field to redirected_url across all components to maintain
consistent terminology throughout the codebase. This change affects:
- AsyncCrawlResponse model
- AsyncPlaywrightCrawlerStrategy
- Documentation and examples
No functional changes, purely naming consistency improvement.
Implements dynamic proxy rotation functionality with authentication support and IP verification. Updates include:
- Added proxy rotation demo in features example
- Updated proxy configuration handling in BrowserManager
- Added proxy rotation documentation
- Updated README with new proxy rotation feature
- Bumped version to 0.4.3b2
This change enables users to dynamically switch between proxies and verify IP addresses for each request.
Update README.md to announce version 0.4.3b1 release with new features including:
- Memory Dispatcher System
- Streaming Support
- LLM-Powered Markdown Generation
- Schema Generation
- Robots.txt Compliance
Add detailed version numbering explanation section to help users understand pre-release versions.
Thank you for pointing this out. I am creating a contributing guide, which is why I changed the name to the contributors, but I forgot to update some other places. Thanks again.
Update all documentation URLs from crawl4ai.com/mkdocs to docs.crawl4ai.com across README, examples, and documentation files. This change reflects the new documentation hosting domain.
Also add todo/ directory to .gitignore.
Update all documentation URLs from crawl4ai.com/mkdocs to docs.crawl4ai.com
Improve badges styling and layout in documentation
Increase code font size in documentation CSS
BREAKING CHANGE: Documentation URLs have changed from crawl4ai.com/mkdocs to docs.crawl4ai.com
Revise the README's personal story section to better reflect the project's
origins, motivation, and vision for open-source data accessibility. Add more
detail about the creator's background and the project's mission to
democratize AI through open data access.
Also includes a minor TODO comment addition in async crawler strategy.
Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.
BREAKING CHANGE: Documentation structure has been significantly reorganized
Enhance Async Crawler with storage state handling
- Updated Async Crawler to support storage state management.
- Added error handling for URL validation in Async Web Crawler.
- Modified README logo and improved .gitignore entries.
- Fixed issues in multiple files for better code robustness.
### New Features:
- **Text-Only Mode**: Added support for text-only crawling by disabling images, JavaScript, GPU, and other non-essential features.
- **Light Mode**: Optimized browser settings to reduce resource usage and improve efficiency during crawling.
- **Dynamic Viewport Adjustment**: Automatically adjusts viewport dimensions based on content size, ensuring accurate rendering and scaling.
- **Full Page Scanning**: Introduced a feature to scroll and capture dynamic content for pages with infinite scroll or lazy-loading elements.
- **Session Management**: Added `create_session` method for creating and managing browser sessions with unique IDs.
### Improvements:
- Unified viewport handling across contexts by dynamically setting dimensions using `self.viewport_width` and `self.viewport_height`.
- Enhanced logging and error handling for viewport adjustments, page scanning, and content evaluation.
- Reduced resource usage with additional browser flags for both `light_mode` and `text_only` configurations.
- Improved handling of cookies, headers, and proxies in session creation.
### Refactoring:
- Removed hardcoded viewport dimensions and replaced them with dynamic configurations.
- Cleaned up unused and commented-out code for better readability and maintainability.
- Introduced defaults for frequently used parameters like `delay_before_return_html`.
### Fixes:
- Resolved potential inconsistencies in viewport handling.
- Improved robustness of content loading and dynamic adjustments to avoid failures and timeouts.
### Docs Update:
- Updated schema usage in `quickstart_async.py` example:
- Changed `OpenAIModelFee.schema()` to `OpenAIModelFee.model_json_schema()` for compatibility.
- Enhanced LLM extraction instruction documentation.
This commit introduces significant enhancements to improve efficiency, flexibility, and reliability of the crawler strategy.
- Introduced the PruningContentFilter for better content relevance.
- Implemented comprehensive unit tests for verification of functionality.
- Enhanced existing BM25ContentFilter tests for edge case coverage.
- Updated documentation to include usage examples for new filter.
- Added new Docker commands for platform-specific builds.
- Updated README with comprehensive installation and setup instructions.
- Introduced `post_install` method in setup script for automation.
- Refined migration processes with enhanced error logging.
- Bump version to 0.3.746 and updated dependencies.
- Added a post-installation setup script for initialization.
- Updated README with installation notes for Playwright setup.
- Enhanced migration logging for better error visibility.
- Added 'pydantic' to requirements.
- Bumped version to 0.3.746.