crawl4ai/docs/md_v2/api/async-webcrawler.md

# AsyncWebCrawler

The **`AsyncWebCrawler`** is the core class for asynchronous web crawling in Crawl4AI. You typically create it **once**, optionally customize it with a **`BrowserConfig`** (e.g., headless, user agent), then **run** multiple **`arun()`** calls with different **`CrawlerRunConfig`** objects.

**Recommended usage**:

1. **Create** a `BrowserConfig` for global browser settings.  

2. **Instantiate** `AsyncWebCrawler(config=browser_config)`.  

3. **Use** the crawler in an async context manager (`async with`) or manage start/close manually.  

4. **Call** `arun(url, config=crawler_run_config)` for each page you want.

---

## 1. Constructor Overview

```python
class AsyncWebCrawler:
    def __init__(
        self,
        crawler_strategy: Optional[AsyncCrawlerStrategy] = None,
        config: Optional[BrowserConfig] = None,
        always_bypass_cache: bool = False,           # deprecated
        always_by_pass_cache: Optional[bool] = None, # also deprecated
        base_directory: str = ...,
        thread_safe: bool = False,
        **kwargs,
    ):
        """
        Create an AsyncWebCrawler instance.

        Args:
            crawler_strategy: 
                (Advanced) Provide a custom crawler strategy if needed.
            config: 
                A BrowserConfig object specifying how the browser is set up.
            always_bypass_cache: 
                (Deprecated) Use CrawlerRunConfig.cache_mode instead.
            base_directory:     
                Folder for storing caches/logs (if relevant).
            thread_safe: 
                If True, attempts some concurrency safeguards. Usually False.
            **kwargs: 
                Additional legacy or debugging parameters.
        """
    )

### Typical Initialization

```python
from crawl4ai import AsyncWebCrawler, BrowserConfig

browser_cfg = BrowserConfig(
    browser_type="chromium",
    headless=True,
    verbose=True
)

crawler = AsyncWebCrawler(config=browser_cfg)
```

**Notes**:

- **Legacy** parameters like `always_bypass_cache` remain for backward compatibility, but prefer to set **caching** in `CrawlerRunConfig`.

---

## 2. Lifecycle: Start/Close or Context Manager

### 2.1 Context Manager (Recommended)

```python
async with AsyncWebCrawler(config=browser_cfg) as crawler:
    result = await crawler.arun("https://example.com")
    # The crawler automatically starts/closes resources
```

When the `async with` block ends, the crawler cleans up (closes the browser, etc.).

### 2.2 Manual Start & Close

```python
crawler = AsyncWebCrawler(config=browser_cfg)
await crawler.start()

result1 = await crawler.arun("https://example.com")
result2 = await crawler.arun("https://another.com")

await crawler.close()
```

Use this style if you have a **long-running** application or need full control of the crawler’s lifecycle.

---

## 3. Primary Method: `arun()`

```python
async def arun(
    self,
    url: str,
    config: Optional[CrawlerRunConfig] = None,
    # Legacy parameters for backward compatibility...
) -> CrawlResult:
    ...
```

### 3.1 New Approach

You pass a `CrawlerRunConfig` object that sets up everything about a crawl—content filtering, caching, session reuse, JS code, screenshots, etc.

```python
import asyncio
from crawl4ai import CrawlerRunConfig, CacheMode

run_cfg = CrawlerRunConfig(
    cache_mode=CacheMode.BYPASS,
    css_selector="main.article",
    word_count_threshold=10,
    screenshot=True
)

async with AsyncWebCrawler(config=browser_cfg) as crawler:
    result = await crawler.arun("https://example.com/news", config=run_cfg)
    print("Crawled HTML length:", len(result.cleaned_html))
    if result.screenshot:
        print("Screenshot base64 length:", len(result.screenshot))
```

### 3.2 Legacy Parameters Still Accepted

For **backward** compatibility, `arun()` can still accept direct arguments like `css_selector=...`, `word_count_threshold=...`, etc., but we strongly advise migrating them into a **`CrawlerRunConfig`**.

---

## 4. Batch Processing: `arun_many()`

```python
async def arun_many(
    self,
    urls: List[str],
    config: Optional[CrawlerRunConfig] = None,
    # Legacy parameters maintained for backwards compatibility...
) -> List[CrawlResult]:
    """
    Process multiple URLs with intelligent rate limiting and resource monitoring.
    """
```

### 4.1 Resource-Aware Crawling

The `arun_many()` method now uses an intelligent dispatcher that:

- Monitors system memory usage
- Implements adaptive rate limiting
- Provides detailed progress monitoring
- Manages concurrent crawls efficiently

### 4.2 Example Usage

Check page [Multi-url Crawling](../advanced/multi-url-crawling.md) for a detailed example of how to use `arun_many()`.

```python

### 4.3 Key Features

1. **Rate Limiting**
   
   - Automatic delay between requests
   - Exponential backoff on rate limit detection
   - Domain-specific rate limiting
   - Configurable retry strategy

2. **Resource Monitoring**

   - Memory usage tracking
   - Adaptive concurrency based on system load
   - Automatic pausing when resources are constrained

3. **Progress Monitoring**

   - Detailed or aggregated progress display
   - Real-time status updates
   - Memory usage statistics

4. **Error Handling**

   - Graceful handling of rate limits
   - Automatic retries with backoff
   - Detailed error reporting

---

## 5. `CrawlResult` Output

Each `arun()` returns a **`CrawlResult`** containing:

- `url`: Final URL (if redirected).
- `html`: Original HTML.
- `cleaned_html`: Sanitized HTML.
- `markdown_v2`: Deprecated. Instead just use regular `markdown`
- `extracted_content`: If an extraction strategy was used (JSON for CSS/LLM strategies).
- `screenshot`, `pdf`: If screenshots/PDF requested.
- `media`, `links`: Information about discovered images/links.
- `success`, `error_message`: Status info.

For details, see [CrawlResult doc](./crawl-result.md).

---

## 6. Quick Example

Below is an example hooking it all together:

```python
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
import json

async def main():
    # 1. Browser config
    browser_cfg = BrowserConfig(
        browser_type="firefox",
        headless=False,
        verbose=True
    )

    # 2. Run config
    schema = {
        "name": "Articles",
        "baseSelector": "article.post",
        "fields": [
            {
                "name": "title", 
                "selector": "h2", 
                "type": "text"
            },
            {
                "name": "url", 
                "selector": "a", 
                "type": "attribute", 
                "attribute": "href"
            }
        ]
    }

    run_cfg = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        extraction_strategy=JsonCssExtractionStrategy(schema),
        word_count_threshold=15,
        remove_overlay_elements=True,
        wait_for="css:.post"  # Wait for posts to appear
    )

    async with AsyncWebCrawler(config=browser_cfg) as crawler:
        result = await crawler.arun(
            url="https://example.com/blog",
            config=run_cfg
        )

        if result.success:
            print("Cleaned HTML length:", len(result.cleaned_html))
            if result.extracted_content:
                articles = json.loads(result.extracted_content)
                print("Extracted articles:", articles[:2])
        else:
            print("Error:", result.error_message)

asyncio.run(main())
```

**Explanation**:

- We define a **`BrowserConfig`** with Firefox, no headless, and `verbose=True`.  
- We define a **`CrawlerRunConfig`** that **bypasses cache**, uses a **CSS** extraction schema, has a `word_count_threshold=15`, etc.  
- We pass them to `AsyncWebCrawler(config=...)` and `arun(url=..., config=...)`.

---

## 7. Best Practices & Migration Notes

1. **Use** `BrowserConfig` for **global** settings about the browser’s environment.  
2. **Use** `CrawlerRunConfig` for **per-crawl** logic (caching, content filtering, extraction strategies, wait conditions).  
3. **Avoid** legacy parameters like `css_selector` or `word_count_threshold` directly in `arun()`. Instead:

   ```python
   run_cfg = CrawlerRunConfig(css_selector=".main-content", word_count_threshold=20)
   result = await crawler.arun(url="...", config=run_cfg)
   ```

4. **Context Manager** usage is simplest unless you want a persistent crawler across many calls.

---

## 8. Summary

**AsyncWebCrawler** is your entry point to asynchronous crawling:

- **Constructor** accepts **`BrowserConfig`** (or defaults).  
- **`arun(url, config=CrawlerRunConfig)`** is the main method for single-page crawls.  
- **`arun_many(urls, config=CrawlerRunConfig)`** handles concurrency across multiple URLs.  
- For advanced lifecycle control, use `start()` and `close()` explicitly.  

**Migration**:  

- If you used `AsyncWebCrawler(browser_type="chromium", css_selector="...")`, move browser settings to `BrowserConfig(...)` and content/crawl logic to `CrawlerRunConfig(...)`.

This modular approach ensures your code is **clean**, **scalable**, and **easy to maintain**. For any advanced or rarely used parameters, see the [BrowserConfig docs](../api/parameters.md).
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								# AsyncWebCrawler
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								The **`AsyncWebCrawler`** is the core class for asynchronous web crawling in Crawl4AI. You typically create it **once**, optionally customize it with a **`BrowserConfig`** (e.g., headless, user agent), then **run** multiple **`arun()`** calls with different **`CrawlerRunConfig`** objects.
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								**Recommended usage**:
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
 . **Create** a `BrowserConfig` for global browser settings.
 . **Instantiate** `AsyncWebCrawler(config=browser_config)`.
 . **Use** the crawler in an async context manager (`async with`) or manage start/close manually.
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+. **Call** `arun(url, config=crawler_run_config)` for each page you want.
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								---
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								## 1. Constructor Overview
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
 								```python
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								class AsyncWebCrawler:
 								    def __init__(
 								        self,
 								        crawler_strategy: Optional[AsyncCrawlerStrategy] = None,
 								        config: Optional[BrowserConfig] = None,
 								        always_bypass_cache: bool = False,           # deprecated
 								        always_by_pass_cache: Optional[bool] = None, # also deprecated
 								        base_directory: str = ...,
 								        thread_safe: bool = False,
 								        **kwargs,
 								    ):
 								        """
 								        Create an AsyncWebCrawler instance.
 								        Args:
-												Update all documents

											
										
										
											2025-01-08 19:31:31 +08:00
+								            crawler_strategy:
 								                (Advanced) Provide a custom crawler strategy if needed.
 								            config:
 								                A BrowserConfig object specifying how the browser is set up.
 								            always_bypass_cache:
 								                (Deprecated) Use CrawlerRunConfig.cache_mode instead.
 								            base_directory:
 								                Folder for storing caches/logs (if relevant).
 								            thread_safe:
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								                If True, attempts some concurrency safeguards. Usually False.
-												Update all documents

											
										
										
											2025-01-08 19:31:31 +08:00
+								            **kwargs:
 								                Additional legacy or debugging parameters.
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								        """
-												Update all documents

											
										
										
											2025-01-08 19:31:31 +08:00
+								    )
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								### Typical Initialization
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
 								```python
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								from crawl4ai import AsyncWebCrawler, BrowserConfig
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								browser_cfg = BrowserConfig(
 								    browser_type="chromium",
 								    headless=True,
 								    verbose=True
 								)
 								crawler = AsyncWebCrawler(config=browser_cfg)
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								**Notes**:
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								- **Legacy** parameters like `always_bypass_cache` remain for backward compatibility, but prefer to set **caching** in `CrawlerRunConfig`.
 								---
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								## 2. Lifecycle: Start/Close or Context Manager
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
 								### 2.1 Context Manager (Recommended)
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```python
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								async with AsyncWebCrawler(config=browser_cfg) as crawler:
 								    result = await crawler.arun("https://example.com")
 								    # The crawler automatically starts/closes resources
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								When the `async with` block ends, the crawler cleans up (closes the browser, etc.).
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								### 2.2 Manual Start & Close
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
 								```python
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								crawler = AsyncWebCrawler(config=browser_cfg)
 								await crawler.start()
 								result1 = await crawler.arun("https://example.com")
 								result2 = await crawler.arun("https://another.com")
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								await crawler.close()
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								Use this style if you have a **long-running** application or need full control of the crawler’s lifecycle.
 								---
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								## 3. Primary Method: `arun()`
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```python
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								async def arun(
 								    self,
 								    url: str,
 								    config: Optional[CrawlerRunConfig] = None,
 								    # Legacy parameters for backward compatibility...
 								) -> CrawlResult:
 								    ...
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								### 3.1 New Approach
 								You pass a `CrawlerRunConfig` object that sets up everything about a crawl—content filtering, caching, session reuse, JS code, screenshots, etc.
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
 								```python
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								import asyncio
 								from crawl4ai import CrawlerRunConfig, CacheMode
 								run_cfg = CrawlerRunConfig(
 								    cache_mode=CacheMode.BYPASS,
 								    css_selector="main.article",
 								    word_count_threshold=10,
 								    screenshot=True
 								)
 								async with AsyncWebCrawler(config=browser_cfg) as crawler:
 								    result = await crawler.arun("https://example.com/news", config=run_cfg)
 								    print("Crawled HTML length:", len(result.cleaned_html))
 								    if result.screenshot:
 								        print("Screenshot base64 length:", len(result.screenshot))
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								### 3.2 Legacy Parameters Still Accepted
 								For **backward** compatibility, `arun()` can still accept direct arguments like `css_selector=...`, `word_count_threshold=...`, etc., but we strongly advise migrating them into a **`CrawlerRunConfig`**.
 								---
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								## 4. Batch Processing: `arun_many()`
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```python
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								async def arun_many(
 								    self,
 								    urls: List[str],
 								    config: Optional[CrawlerRunConfig] = None,
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								    # Legacy parameters maintained for backwards compatibility...
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								) -> List[CrawlResult]:
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								    """
 								    Process multiple URLs with intelligent rate limiting and resource monitoring.
 								    """
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
+								```
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								### 4.1 Resource-Aware Crawling
 								The `arun_many()` method now uses an intelligent dispatcher that:
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								- Monitors system memory usage
 								- Implements adaptive rate limiting
 								- Provides detailed progress monitoring
 								- Manages concurrent crawls efficiently
 								### 4.2 Example Usage
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												feat(proxy): add proxy rotation strategy

Implements a new proxy rotation system with the following changes:
- Add ProxyRotationStrategy abstract base class
- Add RoundRobinProxyStrategy concrete implementation
- Integrate proxy rotation with AsyncWebCrawler
- Add proxy_rotation_strategy parameter to CrawlerRunConfig
- Add example script demonstrating proxy rotation usage
- Remove deprecated synchronous WebCrawler code
- Clean up rate limiting documentation

BREAKING CHANGE: Removed synchronous WebCrawler support and related rate limiting configurations

											
										
										
											2025-02-09 18:49:10 +08:00
+								Check page [Multi-url Crawling](../advanced/multi-url-crawling.md) for a detailed example of how to use `arun_many()`.
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												feat(proxy): add proxy rotation strategy

Implements a new proxy rotation system with the following changes:
- Add ProxyRotationStrategy abstract base class
- Add RoundRobinProxyStrategy concrete implementation
- Integrate proxy rotation with AsyncWebCrawler
- Add proxy_rotation_strategy parameter to CrawlerRunConfig
- Add example script demonstrating proxy rotation usage
- Remove deprecated synchronous WebCrawler code
- Clean up rate limiting documentation

BREAKING CHANGE: Removed synchronous WebCrawler support and related rate limiting configurations

											
										
										
											2025-02-09 18:49:10 +08:00
+								```python
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								### 4.3 Key Features
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+. **Rate Limiting**
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								   - Automatic delay between requests
 								   - Exponential backoff on rate limit detection
 								   - Domain-specific rate limiting
 								   - Configurable retry strategy
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+. **Resource Monitoring**
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								   - Memory usage tracking
 								   - Adaptive concurrency based on system load
 								   - Automatic pausing when resources are constrained
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+. **Progress Monitoring**
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								   - Detailed or aggregated progress display
 								   - Real-time status updates
 								   - Memory usage statistics
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+. **Error Handling**
-												refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components:
- Create dedicated dispatcher classes (MemoryAdaptive, Semaphore)
- Add RateLimiter for smart request throttling
- Implement CrawlerMonitor for real-time progress tracking
- Move dispatcher config from CrawlerRunConfig to separate classes

BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.

											
										
										
											2025-01-11 21:10:27 +08:00
+								   - Graceful handling of rate limits
 								   - Automatic retries with backoff
 								   - Detailed error reporting
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								---
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								## 5. `CrawlResult` Output
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
 								Each `arun()` returns a **`CrawlResult`** containing:
 								- `url`: Final URL (if redirected).
 								- `html`: Original HTML.
 								- `cleaned_html`: Sanitized HTML.
-												Release prep (#749)

* fix: Update export of URLPatternFilter

* chore: Add dependancy for cchardet in requirements

* docs: Update example for deep crawl in release note for v0.5

* Docs: update the example for memory dispatcher

* docs: updated example for crawl strategies

* Refactor: Removed wrapping in if __name__==main block since this is a markdown file.

* chore: removed cchardet from dependancy list, since unclecode is planning to remove it

* docs: updated the example for proxy rotation to a working example

* feat: Introduced ProxyConfig param

* Add tutorial for deep crawl & update contributor list for bug fixes in feb alpha-1

* chore: update and test new dependancies

* feat:Make PyPDF2 a conditional dependancy

* updated tutorial and release note for v0.5

* docs: update docs for deep crawl, and fix a typo in docker-deployment markdown filename

* refactor: 1. Deprecate markdown_v2 2. Make markdown backward compatible to behave as a string when needed. 3. Fix LlmConfig usage in cli 4. Deprecate markdown_v2 in cli 5. Update AsyncWebCrawler for changes in CrawlResult

* fix: Bug in serialisation of markdown in acache_url

* Refactor: Added deprecation errors for fit_html and fit_markdown directly on markdown. Now access them via markdown

* fix: remove deprecated markdown_v2 from docker

* Refactor: remove deprecated fit_markdown and fit_html from result

* refactor: fix cache retrieval for markdown as a string

* chore: update all docs, examples and tests with deprecation announcements for markdown_v2, fit_html, fit_markdown
											
										
										
											2025-02-28 17:23:35 +05:30
+								- `markdown_v2`: Deprecated. Instead just use regular `markdown`
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								- `extracted_content`: If an extraction strategy was used (JSON for CSS/LLM strategies).
 								- `screenshot`, `pdf`: If screenshots/PDF requested.
 								- `media`, `links`: Information about discovered images/links.
 								- `success`, `error_message`: Status info.
 								For details, see [CrawlResult doc](./crawl-result.md).
 								---
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								## 6. Quick Example
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								Below is an example hooking it all together:
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
 								```python
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								import asyncio
 								from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
 								from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
 								import json
 								async def main():
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								    # 1. Browser config
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								    browser_cfg = BrowserConfig(
 								        browser_type="firefox",
 								        headless=False,
 								        verbose=True
 								    )
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								    # 2. Run config
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								    schema = {
 								        "name": "Articles",
 								        "baseSelector": "article.post",
 								        "fields": [
-												Update all documents

											
										
										
											2025-01-08 19:31:31 +08:00
+								            {
 								                "name": "title",
 								                "selector": "h2",
 								                "type": "text"
 								            },
 								            {
 								                "name": "url",
 								                "selector": "a",
 								                "type": "attribute",
 								                "attribute": "href"
 								            }
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								        ]
 								    }
 								    run_cfg = CrawlerRunConfig(
 								        cache_mode=CacheMode.BYPASS,
 								        extraction_strategy=JsonCssExtractionStrategy(schema),
 								        word_count_threshold=15,
 								        remove_overlay_elements=True,
 								        wait_for="css:.post"  # Wait for posts to appear
 								    )
 								    async with AsyncWebCrawler(config=browser_cfg) as crawler:
 								        result = await crawler.arun(
 								            url="https://example.com/blog",
 								            config=run_cfg
 								        )
 								        if result.success:
 								            print("Cleaned HTML length:", len(result.cleaned_html))
 								            if result.extracted_content:
 								                articles = json.loads(result.extracted_content)
 								                print("Extracted articles:", articles[:2])
 								        else:
 								            print("Error:", result.error_message)
 								asyncio.run(main())
 								```
 								**Explanation**:
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
 								- We define a **`BrowserConfig`** with Firefox, no headless, and `verbose=True`.
 								- We define a **`CrawlerRunConfig`** that **bypasses cache**, uses a **CSS** extraction schema, has a `word_count_threshold=15`, etc.
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								- We pass them to `AsyncWebCrawler(config=...)` and `arun(url=..., config=...)`.
 								---
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								## 7. Best Practices & Migration Notes
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+. **Use** `BrowserConfig` for **global** settings about the browser’s environment.
 . **Use** `CrawlerRunConfig` for **per-crawl** logic (caching, content filtering, extraction strategies, wait conditions).
 . **Avoid** legacy parameters like `css_selector` or `word_count_threshold` directly in `arun()`. Instead:
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
 								   ```python
 								   run_cfg = CrawlerRunConfig(css_selector=".main-content", word_count_threshold=20)
 								   result = await crawler.arun(url="...", config=run_cfg)
 								   ```
 . **Context Manager** usage is simplest unless you want a persistent crawler across many calls.
 								---
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								## 8. Summary
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
 								**AsyncWebCrawler** is your entry point to asynchronous crawling:
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								- **Constructor** accepts **`BrowserConfig`** (or defaults).
 								- **`arun(url, config=CrawlerRunConfig)`** is the main method for single-page crawls.
 								- **`arun_many(urls, config=CrawlerRunConfig)`** handles concurrency across multiple URLs.
 								- For advanced lifecycle control, use `start()` and `close()` explicitly.
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
 								**Migration**:
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
-												refactor(docs): reorganize documentation structure and update styles

Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.

BREAKING CHANGE: Documentation structure has been significantly reorganized

											
										
										
											2025-01-07 20:49:50 +08:00
+								- If you used `AsyncWebCrawler(browser_type="chromium", css_selector="...")`, move browser settings to `BrowserConfig(...)` and content/crawl logic to `CrawlerRunConfig(...)`.
-												Update Documentation

											
										
										
											2024-10-27 19:24:46 +08:00
-												docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files:
- arun.md
- arun_many.md
- async-webcrawler.md
- parameters.md

Changes include:
- Consistent list formatting and indentation
- Better spacing between sections
- Clearer separation of content blocks
- Fixed quotation marks and code block formatting

											
										
										
											2025-01-25 22:06:11 +08:00
+								This modular approach ensures your code is **clean**, **scalable**, and **easy to maintain**. For any advanced or rarely used parameters, see the [BrowserConfig docs](../api/parameters.md).