crawl4ai

Author	SHA1	Message	Date
UncleCode	2a82455b3d	feat(crawl): implement direct crawl functionality and introduce CacheMode for improved caching control	2024-11-17 17:17:34 +08:00
UncleCode	3a66aa8a60	feat(cache): introduce CacheMode and CacheContext for enhanced caching behavior chore(requirements): add colorama dependency refactor(config): add SHOW_DEPRECATION_WARNINGS flag and clean up code fix(docs): update example scripts for clarity and consistency	2024-11-17 15:30:56 +08:00
UncleCode	4b45b28f25	feat(docs): enhance deployment documentation with one-click setup, API security details, and Docker Compose examples	2024-11-16 18:44:47 +08:00
UncleCode	6360d0545a	feat(api): add API token authentication and update Dockerfile description	2024-11-16 18:08:56 +08:00
UncleCode	90df6921b7	feat(crawl_sync): add synchronous crawl endpoint and corresponding test	2024-11-16 15:34:30 +08:00
UncleCode	ae7ebc0bd8	chore: update .gitignore and enhance changelog with major feature additions and examples	2024-11-15 20:16:13 +08:00
UncleCode	f9a297e08d	Add Docker example script for testing Crawl4AI functionality	2024-11-08 19:39:05 +08:00
UncleCode	9307c19f35	Update documents, upload new version of quickstart.	2024-10-30 20:39:35 +08:00
UncleCode	3529c2e732	Update new tutorial documents and added to the docs folder.	2024-10-30 00:16:18 +08:00
UncleCode	4239654722	Update Documentation	2024-10-27 19:24:46 +08:00
UncleCode	4e2852d5ff	[v0.3.71] Enhance chunking strategies and improve overall performance - Add OverlappingWindowChunking and improve SlidingWindowChunking - Update CHUNK_TOKEN_THRESHOLD to 2048 tokens - Optimize AsyncPlaywrightCrawlerStrategy close method - Enhance flexibility in CosineStrategy with generic embedding model loading - Improve JSON-based extraction strategies - Add knowledge graph generation example	2024-10-19 18:36:59 +08:00
UncleCode	b309bc34e1	Fix the model nam ein quick start example	2024-10-18 15:32:25 +08:00
UncleCode	b8147b64e0	chore: Bump version to 0.3.71 and improve error handling - Update version number to 0.3.71 - Add sleep_on_close option to AsyncPlaywrightCrawlerStrategy - Enhance context creation with additional options - Improve error message formatting and visibility - Update quickstart documentation	2024-10-18 13:31:12 +08:00
UncleCode	768aa06ceb	feat(crawler): Enhance stealth and flexibility, improve error handling - Implement playwright_stealth for better bot detection avoidance - Add user simulation and navigator override options - Improve iframe processing and browser selection - Enhance error reporting and debugging capabilities - Optimize image processing and parallel crawling - Add new example for user simulation feature - Added support for including links in Markdown content, by definin g a new flag `include_links_on_markdown` in `crawl` method.	2024-10-17 21:37:48 +08:00
unclecode	320afdea64	feat: Enhance crawler flexibility and LLM extraction capabilities - Add browser type selection (Chromium, Firefox, WebKit) - Implement iframe content extraction - Improve image processing and dimension updates - Add custom headers support in AsyncPlaywrightCrawlerStrategy - Enhance delayed content retrieval with new parameter - Optimize HTML sanitization and Markdown conversion - Update examples in quickstart_async.py for new features	2024-10-14 21:03:28 +08:00
unclecode	b9bbd42373	Update Quickstart examples	2024-10-13 14:37:45 +08:00
unclecode	68e9144ce3	feat: Enhance crawling control and LLM extraction flexibility - Add before_retrieve_html hook and delay_before_return_html option - Implement flexible page_timeout for smart_wait function - Support extra_args and custom headers in LLM extraction - Allow arbitrary kwargs in AsyncWebCrawler initialization - Improve perform_completion_with_backoff for custom API calls - Update examples with new features and diverse LLM providers	2024-10-12 14:48:22 +08:00
unclecode	ff3524d9b1	feat(v0.3.6): Add screenshot capture, delayed content, and custom timeouts - Implement screenshot capture functionality - Add delayed content retrieval method - Introduce custom page timeout parameter - Enhance LLM support with multiple providers - Improve database schema auto-updates - Optimize image processing in WebScrappingStrategy - Update error handling and logging - Expand examples in quickstart_async.py	2024-10-12 13:42:42 +08:00
unclecode	4750810a67	Enhance AsyncWebCrawler with smart waiting and screenshot capabilities - Implement smart_wait function in AsyncPlaywrightCrawlerStrategy - Add screenshot support to AsyncCrawlResponse and AsyncWebCrawler - Improve error handling and timeout management in crawling process - Fix typo in CrawlResult model (responser_headers -> response_headers) - Update .gitignore to exclude additional files - Adjust import path in test_basic_crawling.py	2024-10-02 17:34:56 +08:00
unclecode	5d4e92db7d	Update quickstart_async.py to improve performance and add Firecrawl simulation	2024-09-28 00:11:39 +08:00
unclecode	8b6e88c85c	Update .gitignore to ignore temporary and test directories	2024-09-26 15:09:49 +08:00
unclecode	4d48bd31ca	Push async version last changes for merge to main branch	2024-09-24 20:52:08 +08:00
unclecode	eb131bebdf	Create series of quickstart files.	2024-09-04 15:33:24 +08:00
unclecode	5c15837677	chore: Update README, generate new notbook for quickstart	2024-09-04 14:46:22 +08:00
unclecode	2fada16abb	chore: Update crawl4ai package with AsyncWebCrawler and JsonCssExtractionStrategy	2024-09-03 23:32:27 +08:00
unclecode	659c8cd953	refactor: Update image description minimum word threshold in get_content_of_website_optimized	2024-08-02 15:55:32 +08:00
unclecode	4d283ab386	## [v0.2.74] - 2024-07-08 A slew of exciting updates to improve the crawler's stability and robustness! 🎉 - 💻 UTF encoding fix: Resolved the Windows \"charmap\" error by adding UTF encoding. - 🛡️ Error handling: Implemented MaxRetryError exception handling in LocalSeleniumCrawlerStrategy. - 🧹 Input sanitization: Improved input sanitization and handled encoding issues in LLMExtractionStrategy. - 🚮 Database cleanup: Removed existing database file and initialized a new one.	2024-07-08 16:33:25 +08:00
unclecode	e7705e661a	ADD MKDocs	2024-06-21 17:56:54 +08:00
unclecode	21b110bfd7	Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page.	2024-06-19 19:03:35 +08:00
unclecode	539263a8ba	chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README	2024-06-19 18:32:20 +08:00
unclecode	3f0e265baf	Merge branch 'format-inline-tags'	2024-06-19 00:48:38 +08:00
unclecode	21e2538e57	Update quickstart.py	2024-06-19 00:37:53 +08:00
unclecode	77da48050d	chore: Add custom headers to LocalSeleniumCrawlerStrategy	2024-06-17 15:50:03 +08:00
unclecode	9a97aacd85	chore: Add hooks for customizing the LocalSeleniumCrawlerStrategy	2024-06-17 15:37:18 +08:00
unclecode	b3a0edaa6d	- User agent - Extract Links - Extract Metadata - Update Readme - Update REST API document	2024-06-08 17:59:42 +08:00
unclecode	36a5847df5	Add css selector example	2024-06-07 20:47:20 +08:00
unclecode	a19379aa58	Add recipe images, update README, and REST api example	2024-06-07 20:43:50 +08:00
unclecode	768d048e1c	Update rest call how to use	2024-06-07 18:10:45 +08:00
unclecode	aeb2114170	Add example of REST API call	2024-06-07 16:24:40 +08:00
unclecode	226a62a3c0	feat: Add screenshot functionality to crawl_urls	2024-06-07 15:33:15 +08:00
unclecode	8e73a482a2	feat: Add screenshot functionality to crawl_urls The code changes in this commit add the `screenshot` parameter to the `crawl_urls` function in `main.py`. This allows users to specify whether they want to take a screenshot of the page during the crawling process. The default value is `False`. This commit message follows the established convention of starting with a type (feat for feature) and providing a concise and descriptive summary of the changes made.	2024-06-07 15:23:32 +08:00
unclecode	c7553b1280	Update research assistant example with package installation instructions	2024-06-04 23:18:19 +08:00
unclecode	8b8683f22e	Add research assistant example using Chainlit	2024-06-04 22:43:09 +08:00
unclecode	51f26d12fe	Update for v0.2.2 - Support multiple JS scripts - Fixed some of bugs - Resolved a few issue relevant to Colab installation	2024-06-02 15:40:18 +08:00
unclecode	13a3b21d19	- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.	2024-05-19 22:30:10 +08:00
unclecode	eb6423875f	chore: Update Selenium options in crawler_strategy.py and add verbose logging in CosineStrategy	2024-05-18 14:13:06 +08:00
unclecode	b6319c6f6e	chore: Add support for GPU, MPS, and CPU	2024-05-17 21:56:13 +08:00
unclecode	957a2458b1	chore: Update web crawler URLs to use NBC News business section	2024-05-17 18:11:13 +08:00
unclecode	1cc67df301	chore: Update pip installation command and requirements, add new dependencies	2024-05-17 16:53:03 +08:00
unclecode	a5f9d07dbf	Remove dependency on Spacy model.	2024-05-17 15:08:03 +08:00

1 2

52 Commits