Below is a sample Markdown file (`tutorials/async-webcrawler-basics.md`) illustrating how you might teach new users the fundamentals of `AsyncWebCrawler`. This tutorial builds on the **Getting Started** section by introducing key configuration parameters and the structure of the crawl result. Feel free to adjust the code snippets, wording, or format to match your style.
---
# AsyncWebCrawler Basics
In this tutorial, you’ll learn how to:
1. Create and configure an `AsyncWebCrawler` instance
2. Understand the `CrawlResult` object returned by `arun()`
3. Use basic `BrowserConfig` and `CrawlerRunConfig` options to tailor your crawl
> **Prerequisites**
> - You’ve already completed the [Getting Started](./getting-started.md) tutorial (or have equivalent knowledge).
> - You have **Crawl4AI** installed and configured with Playwright.
---
## 1. What is `AsyncWebCrawler`?
`AsyncWebCrawler` is the central class for running asynchronous crawling operations in Crawl4AI. It manages browser sessions, handles dynamic pages (if needed), and provides you with a structured result object for each crawl. Essentially, it’s your high-level interface for collecting page data.
```python
from crawl4ai import AsyncWebCrawler
async with AsyncWebCrawler() as crawler:
result = await crawler.arun("https://example.com")
print(result)
```
---
## 2. Creating a Basic `AsyncWebCrawler` Instance
Below is a simple code snippet showing how to create and use `AsyncWebCrawler`. This goes one step beyond the minimal example you saw in [Getting Started](./getting-started.md).
```python
import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai import BrowserConfig, CrawlerRunConfig
async def main():
# 1. Set up configuration objects (optional if you want defaults)
browser_config = BrowserConfig(
browser_type="chromium",
headless=True,
verbose=True
)
crawler_config = CrawlerRunConfig(
page_timeout=30000, # 30 seconds
wait_for_images=True,
verbose=True
)
# 2. Initialize AsyncWebCrawler with your chosen browser config
async with AsyncWebCrawler(config=browser_config) as crawler:
# 3. Run a single crawl
url_to_crawl = "https://example.com"
result = await crawler.arun(url=url_to_crawl, config=crawler_config)
Below are a few `BrowserConfig` and `CrawlerRunConfig` parameters you might tweak early on. We’ll cover more advanced ones (like proxies, PDF, or screenshots) in later tutorials.
When using AsyncWebCrawler on Windows, you might encounter a `NotImplementedError` related to `asyncio.create_subprocess_exec`. This is a known Windows-specific issue that occurs because Windows' default event loop doesn't support subprocess operations.
To resolve this, Crawl4AI provides a utility function to configure Windows to use the ProactorEventLoop. Call this function before running any async operations:
```python
from crawl4ai.utils import configure_windows_event_loop
# Call this before any async operations if you're on Windows
- **Smart Crawling Techniques**: Learn to handle iframes, advanced caching, and selective extraction in the [next tutorial](./smart-crawling.md).
- **Hooks & Custom Code**: See how to inject custom logic before and after navigation in a dedicated [Hooks Tutorial](./hooks-custom.md).
- **Reference**: For a complete list of every parameter in `BrowserConfig` and `CrawlerRunConfig`, check out the [Reference section](../../reference/configuration.md).
---
## Summary
You now know the basics of **AsyncWebCrawler**:
- How to create it with optional browser/crawler configs
- How `arun()` works for single-page crawls
- Where to find your crawled data in `CrawlResult`
- A handful of frequently used configuration parameters
From here, you can refine your crawler to handle more advanced scenarios, like focusing on specific content or dealing with dynamic elements. Let’s move on to **[Smart Crawling Techniques](./smart-crawling.md)** to learn how to handle iframes, advanced caching, and more.
---
**Last updated**: 2024-XX-XX
Keep exploring! If you get stuck, remember to check out the [How-To Guides](../../how-to/) for targeted solutions or the [Explanations](../../explanations/) for deeper conceptual background.