crawl4ai

mirror of https://github.com/unclecode/crawl4ai.git synced 2025-12-30 03:48:54 +00:00

Author	SHA1	Message	Date
UncleCode	ca3e33122e	refactor(docs): reorganize documentation structure and update styles Reorganize documentation into core/advanced/extraction sections for better navigation. Update terminal theme styles and add rich library for better CLI output. Remove redundant tutorial files and consolidate content into core sections. Add personal story to index page for project context. BREAKING CHANGE: Documentation structure has been significantly reorganized	2025-01-07 20:49:50 +08:00
UncleCode	ae376f15fb	docs(extraction): add clarifying comments for CSS selector behavior Add explanatory comments to JsonCssExtractionStrategy._get_elements() method to clarify that it returns all matching elements using select() instead of select_one(). This helps developers understand the method's behavior and its difference from single element selection. Removed trailing whitespace at end of file.	2025-01-05 19:39:15 +08:00
UncleCode	72fbdac467	fix(extraction): JsonCss selector and crawler improvements - Fix JsonCssExtractionStrategy._get_elements to return all matching elements instead of just one - Add robust error handling to page_need_scroll with default fallback - Improve JSON extraction strategies documentation - Refactor content scraping strategy - Update version to 0.4.247	2025-01-05 19:26:46 +08:00
UncleCode	0857c7b448	Merge branch 'main' of https://github.com/unclecode/crawl4ai into next	2025-01-05 17:05:59 +08:00
Guilume	07b4c1c0ed	fix: not working long page screenshot (#403 )	2025-01-05 17:04:34 +08:00
UncleCode	196dc79ec7	fix: prevent memory leaks by ensuring proper closure of Playwright pages - Fixes critical memory leak issue where browser pages remained open - Ensures proper cleanup of Playwright resources after page operations - Improves resource management in browser farm implementation This is an urgent fix to address resource leakage that could impact system stability.	2025-01-03 21:17:23 +08:00
UncleCode	24b3da717a	refactor(): - Update hello world example	2025-01-02 17:53:30 +08:00
UncleCode	98acc4254d	refactor: - Update hello_world.py example	2025-01-01 19:47:22 +08:00
UncleCode	eac78c7993	Merge branch 'vr0.4.246'	2025-01-01 19:43:01 +08:00
UncleCode	da1bc0f7bf	Update version file	2025-01-01 19:42:35 +08:00
UncleCode	aa4f92f458	refactor(crawler): - Update hello_world example with proper content filtering	2025-01-01 19:39:42 +08:00
UncleCode	a96e05d4ae	refactor(crawler): optimize response handling and default settings - Set wait_for_images default to false for better performance - Simplify response attribute copying in AsyncWebCrawler - Update hello_world example with proper content filtering	2025-01-01 19:39:02 +08:00
UncleCode	5c95fd92b4	fix(browser): resolve merge conflicts in browser channel configuration	2025-01-01 19:05:47 +08:00
UncleCode	4cb2a62551	Update README	2025-01-01 18:59:55 +08:00
UncleCode	5b4fad9e25	- Bump version to 0.4.244	2025-01-01 18:58:43 +08:00
UncleCode	ea0ac25f38	refactor(browser): Update browser channel default to 'chromium' in BrowserConfig.from_args method	2025-01-01 18:58:15 +08:00
UncleCode	7688aca7d6	Update Version	2025-01-01 18:44:27 +08:00
UncleCode	a7215ad972	fix(browser): update default browser channel to chromium and simplify channel selection logic	2025-01-01 18:38:33 +08:00
Arno.Edwards	8e2403a7da	fix(browser)!: default to Chromium channel for new headless mode (#387 ) BREAKING CHANGE: Updated `chrome_channel` to "chromium" to fix compatibility with the new Chromium headless implementation. This resolves the error `playwright._impl._errors.Error: BrowserType.launch: Chromium distribution 'chrome' is not found`, caused by the removal of the old headless mode in Chromium. With this change, channels like "chrome" and "msedge" now default to the new headless mode, aligning with upstream updates in Playwright v1.49. The new headless mode uses the real Chrome browser, offering more authenticity, reliability, and feature parity with the full browser. Additionally, simplified fallback logic by directly assigning `chrome_channel` based on `browser_type` or defaulting to "chromium". Refer to: - https://playwright.dev/python/docs/browsers#chromium - https://github.com/microsoft/playwright/issues/33566	2025-01-01 18:37:50 +08:00
UncleCode	318554e6bf	Merge branch 'v0.4.243' v0.4.243	2025-01-01 18:11:15 +08:00
UncleCode	c64979b8dd	docs: update README	2025-01-01 18:10:38 +08:00
UncleCode	bfe21b29d4	build: streamline package discovery and bump to v0.4.243 - Replace explicit package listing with setuptools.find - Include all crawl4ai.* packages automatically - Use `packages = {find = {where = ["."], include = ["crawl4ai*"]}}` syntax - Bump version to 0.4.243 This change simplifies package maintenance by automatically discovering all subpackages under crawl4ai namespace instead of listing them manually.	2025-01-01 17:55:59 +08:00
UncleCode	e9d9a6ffe8	fix: ensure js_snippet files are included in package - Add js_snippet to packages list in pyproject.toml - Verified JS files are properly included in installed package - Bump version to 0.4.242	2025-01-01 17:38:59 +08:00
UncleCode	5313c71a0d	docs: update REAME browser installation command - Remove Chrome from manual installation command - Keep Chromium as the only default browser in docs	2025-01-01 17:24:44 +08:00
UncleCode	d36ef3d424	refactor(install): use chromium as default browser - Remove Chrome installation to reduce setup time - Keep Chromium as default browser for better cross-platform compatibility	2025-01-01 17:19:54 +08:00
UncleCode	4a4f613238	docs: simplify installation instructions - Add crawl4ai-doctor command to verify installation - Update browser installation instructions in README and docs - Move optional features to documentation - Add manual browser installation steps as fallback - Update getting-started guide with verification step	2025-01-01 16:54:03 +08:00
UncleCode	dc6a24618e	feat(install): add doctor command and force browser install - Add --force flag to Playwright browser installation - Add doctor command to test crawling functionality - Install Chrome and Chromium browsers explicitly - Add crawl4ai-doctor entry point in pyproject.toml - Implement simple health check focused on crawling test	2025-01-01 16:33:43 +08:00
UncleCode	74a7c6dbb6	feat(install): specify chrome and chromium for playwright - Install Chrome and Chromium browsers explicitly - Split browser installation into separate commands	2025-01-01 16:10:08 +08:00
UncleCode	67f65f958b	refactor(build): simplify setup.py configuration - Remove dependency management from setup.py - Remove entry points configuration (moved to pyproject.toml) - Keep minimal setup.py for backwards compatibility - Clean up package metadata structure	2025-01-01 15:52:01 +08:00
UncleCode	78b6ba5cef	build: modernize package configuration with pyproject.toml - Add pyproject.toml for PEP 517 build system support - Configure dependencies, scripts, and metadata in pyproject.toml - Set Python requirement to >=3.9 and add support up to 3.13 - Keep setup.py for backwards compatibility - Move package dependencies and entry points to pyproject.toml	2025-01-01 15:45:27 +08:00
UncleCode	3f019d34cc	docs: update project description emojis - Change project description emojis from 🔥🕷️ to 🚀🤖 - Update emojis consistently in both setup.py and pyproject.toml	2025-01-01 15:39:33 +08:00
UncleCode	304260e484	refactor(install): simplify Playwright installation error handling - Remove setup_docs() call from post_install() - Simplify error messages for Playwright installation failures - Use sys.executable for more accurate Python path in error messages - Add --with-deps flag to Playwright install command	2025-01-01 15:33:36 +08:00
UncleCode	704bd66b63	Uphrade plawyright installation command to install dependencies	2025-01-01 15:23:16 +08:00
UncleCode	1acc162c18	Bumb version v0.4.241	2025-01-01 15:16:06 +08:00
UncleCode	553c97a0c1	Fix bug reported in issue https://github.com/unclecode/crawl4ai/issues/396	2025-01-01 15:15:14 +08:00
UncleCode	bd66befcf0	Fix issue in 0.4.24 walkthrough	2024-12-31 21:07:58 +08:00
UncleCode	3e769a9c6c	Fix issue in 0.4.24 walkthrough	2024-12-31 21:07:33 +08:00
UncleCode	19b0a5ae82	Update 0.4.24 walkthrough	2024-12-31 21:01:46 +08:00
UncleCode	bd71f7f4ea	Add 0.4.24 walkthrough	2024-12-31 20:22:33 +08:00
UncleCode	171ce25ba6	Fixe typo in CHANGELOG	2024-12-31 19:49:00 +08:00
UncleCode	6c5a44f774	chore: bump version to 0.4.25	2024-12-31 19:45:48 +08:00
UncleCode	5c3c05bf93	docs: update README badges and Docker section, reorganize documentation structure	2024-12-31 19:45:02 +08:00
UncleCode	67d0999bc3	chore: resolve merge conflicts for v0.4.24 v0.4.24	2024-12-31 19:24:03 +08:00
UncleCode	553a4622bf	chore: prepare for version 0.4.24	2024-12-31 19:18:36 +08:00
UncleCode	6f81ef006d	Remove .local folder from remote repository	2024-12-31 17:37:50 +08:00
UncleCode	a04870a662	Remove .do folder	2024-12-31 17:37:14 +08:00
UncleCode	f7d26390c5	Remove .do folder	2024-12-31 17:36:22 +08:00
UncleCode	141783fb2d	Remove .do folder from remote repository	2024-12-31 17:35:57 +08:00
UncleCode	2fedd4876e	Update gitignore	2024-12-31 17:35:34 +08:00
UncleCode	e187b0aaf0	update gitignore	2024-12-31 17:34:31 +08:00

1 2 3 4 5 ...

533 Commits