unclecode
ff3524d9b1
feat(v0.3.6): Add screenshot capture, delayed content, and custom timeouts
...
- Implement screenshot capture functionality
- Add delayed content retrieval method
- Introduce custom page timeout parameter
- Enhance LLM support with multiple providers
- Improve database schema auto-updates
- Optimize image processing in WebScrappingStrategy
- Update error handling and logging
- Expand examples in quickstart_async.py
2024-10-12 13:42:42 +08:00
unclecode
4750810a67
Enhance AsyncWebCrawler with smart waiting and screenshot capabilities
...
- Implement smart_wait function in AsyncPlaywrightCrawlerStrategy
- Add screenshot support to AsyncCrawlResponse and AsyncWebCrawler
- Improve error handling and timeout management in crawling process
- Fix typo in CrawlResult model (responser_headers -> response_headers)
- Update .gitignore to exclude additional files
- Adjust import path in test_basic_crawling.py
2024-10-02 17:34:56 +08:00
unclecode
5d4e92db7d
Update quickstart_async.py to improve performance and add Firecrawl simulation
2024-09-28 00:11:39 +08:00
unclecode
8b6e88c85c
Update .gitignore to ignore temporary and test directories
2024-09-26 15:09:49 +08:00
unclecode
4d48bd31ca
Push async version last changes for merge to main branch
2024-09-24 20:52:08 +08:00
unclecode
eb131bebdf
Create series of quickstart files.
2024-09-04 15:33:24 +08:00
unclecode
5c15837677
chore: Update README, generate new notbook for quickstart
2024-09-04 14:46:22 +08:00
unclecode
2fada16abb
chore: Update crawl4ai package with AsyncWebCrawler and JsonCssExtractionStrategy
2024-09-03 23:32:27 +08:00
unclecode
659c8cd953
refactor: Update image description minimum word threshold in get_content_of_website_optimized
2024-08-02 15:55:32 +08:00
unclecode
4d283ab386
## [v0.2.74] - 2024-07-08
...
A slew of exciting updates to improve the crawler's stability and robustness! 🎉
- 💻 **UTF encoding fix**: Resolved the Windows \"charmap\" error by adding UTF encoding.
- 🛡️ **Error handling**: Implemented MaxRetryError exception handling in LocalSeleniumCrawlerStrategy.
- 🧹 **Input sanitization**: Improved input sanitization and handled encoding issues in LLMExtractionStrategy.
- 🚮 **Database cleanup**: Removed existing database file and initialized a new one.
2024-07-08 16:33:25 +08:00
unclecode
e7705e661a
ADD MKDocs
2024-06-21 17:56:54 +08:00
unclecode
21b110bfd7
Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page.
2024-06-19 19:03:35 +08:00
unclecode
539263a8ba
chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
2024-06-19 18:32:20 +08:00
unclecode
3f0e265baf
Merge branch 'format-inline-tags'
2024-06-19 00:48:38 +08:00
unclecode
21e2538e57
Update quickstart.py
2024-06-19 00:37:53 +08:00
unclecode
77da48050d
chore: Add custom headers to LocalSeleniumCrawlerStrategy
2024-06-17 15:50:03 +08:00
unclecode
9a97aacd85
chore: Add hooks for customizing the LocalSeleniumCrawlerStrategy
2024-06-17 15:37:18 +08:00
unclecode
b3a0edaa6d
- User agent
...
- Extract Links
- Extract Metadata
- Update Readme
- Update REST API document
2024-06-08 17:59:42 +08:00
unclecode
36a5847df5
Add css selector example
2024-06-07 20:47:20 +08:00
unclecode
a19379aa58
Add recipe images, update README, and REST api example
2024-06-07 20:43:50 +08:00
unclecode
768d048e1c
Update rest call how to use
2024-06-07 18:10:45 +08:00
unclecode
aeb2114170
Add example of REST API call
2024-06-07 16:24:40 +08:00
unclecode
226a62a3c0
feat: Add screenshot functionality to crawl_urls
2024-06-07 15:33:15 +08:00
unclecode
8e73a482a2
feat: Add screenshot functionality to crawl_urls
...
The code changes in this commit add the `screenshot` parameter to the `crawl_urls` function in `main.py`. This allows users to specify whether they want to take a screenshot of the page during the crawling process. The default value is `False`.
This commit message follows the established convention of starting with a type (feat for feature) and providing a concise and descriptive summary of the changes made.
2024-06-07 15:23:32 +08:00
unclecode
c7553b1280
Update research assistant example with package installation instructions
2024-06-04 23:18:19 +08:00
unclecode
8b8683f22e
Add research assistant example using Chainlit
2024-06-04 22:43:09 +08:00
unclecode
51f26d12fe
Update for v0.2.2
...
- Support multiple JS scripts
- Fixed some of bugs
- Resolved a few issue relevant to Colab installation
2024-06-02 15:40:18 +08:00
unclecode
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
unclecode
eb6423875f
chore: Update Selenium options in crawler_strategy.py and add verbose logging in CosineStrategy
2024-05-18 14:13:06 +08:00
unclecode
b6319c6f6e
chore: Add support for GPU, MPS, and CPU
2024-05-17 21:56:13 +08:00
unclecode
957a2458b1
chore: Update web crawler URLs to use NBC News business section
2024-05-17 18:11:13 +08:00
unclecode
1cc67df301
chore: Update pip installation command and requirements, add new dependencies
2024-05-17 16:53:03 +08:00
unclecode
a5f9d07dbf
Remove dependency on Spacy model.
2024-05-17 15:08:03 +08:00
UncleCode
6fcaf26b4f
Update quickstart.py: Add counting items
2024-05-16 22:49:12 +08:00
unclecode
c8589f8da3
Update:
...
- Fix Spacy model issue
- Update Readme and requirements.txt
2024-05-16 19:50:20 +08:00