crawl4ai/prompts/prompt_net_requests.md

I want to enhance the `AsyncPlaywrightCrawlerStrategy` to optionally capture network requests and console messages during a crawl, storing them in the final `CrawlResult`.

Here's a breakdown of the proposed changes across the relevant files:

**1. Configuration (`crawl4ai/async_configs.py`)**

*   **Goal:** Add flags to `CrawlerRunConfig` to enable/disable capturing.
*   **Changes:**
    *   Add two new boolean attributes to `CrawlerRunConfig`:
        *   `capture_network_requests: bool = False`
        *   `capture_console_messages: bool = False`
    *   Update `__init__`, `from_kwargs`, `to_dict`, and implicitly `clone`/`dump`/`load` to include these new attributes.

```python
# ==== File: crawl4ai/async_configs.py ====
# ... (imports) ...

class CrawlerRunConfig():
    # ... (existing attributes) ...

    # NEW: Network and Console Capturing Parameters
    capture_network_requests: bool = False
    capture_console_messages: bool = False

    # Experimental Parameters
    experimental: Dict[str, Any] = None,

    def __init__(
        self,
        # ... (existing parameters) ...

        # NEW: Network and Console Capturing Parameters
        capture_network_requests: bool = False,
        capture_console_messages: bool = False,

        # Experimental Parameters
        experimental: Dict[str, Any] = None,
    ):
        # ... (existing assignments) ...

        # NEW: Assign new parameters
        self.capture_network_requests = capture_network_requests
        self.capture_console_messages = capture_console_messages

        # Experimental Parameters
        self.experimental = experimental or {}

        # ... (rest of __init__) ...

    @staticmethod
    def from_kwargs(kwargs: dict) -> "CrawlerRunConfig":
        return CrawlerRunConfig(
            # ... (existing kwargs gets) ...

            # NEW: Get new parameters
            capture_network_requests=kwargs.get("capture_network_requests", False),
            capture_console_messages=kwargs.get("capture_console_messages", False),

            # Experimental Parameters
            experimental=kwargs.get("experimental"),
        )

    def to_dict(self):
        return {
            # ... (existing dict entries) ...

            # NEW: Add new parameters to dict
            "capture_network_requests": self.capture_network_requests,
            "capture_console_messages": self.capture_console_messages,

            "experimental": self.experimental,
        }

    # clone(), dump(), load() should work automatically if they rely on to_dict() and from_kwargs()
    # or the serialization logic correctly handles all attributes.
```

**2. Data Models (`crawl4ai/models.py`)**

*   **Goal:** Add fields to store the captured data in the response/result objects.
*   **Changes:**
    *   Add `network_requests: Optional[List[Dict[str, Any]]] = None` and `console_messages: Optional[List[Dict[str, Any]]] = None` to `AsyncCrawlResponse`.
    *   Add the same fields to `CrawlResult`.

```python
# ==== File: crawl4ai/models.py ====
# ... (imports) ...

# ... (Existing dataclasses/models) ...

class AsyncCrawlResponse(BaseModel):
    html: str
    response_headers: Dict[str, str]
    js_execution_result: Optional[Dict[str, Any]] = None
    status_code: int
    screenshot: Optional[str] = None
    pdf_data: Optional[bytes] = None
    get_delayed_content: Optional[Callable[[Optional[float]], Awaitable[str]]] = None
    downloaded_files: Optional[List[str]] = None
    ssl_certificate: Optional[SSLCertificate] = None
    redirected_url: Optional[str] = None
    # NEW: Fields for captured data
    network_requests: Optional[List[Dict[str, Any]]] = None
    console_messages: Optional[List[Dict[str, Any]]] = None

    class Config:
        arbitrary_types_allowed = True

# ... (Existing models like MediaItem, Link, etc.) ...

class CrawlResult(BaseModel):
    url: str
    html: str
    success: bool
    cleaned_html: Optional[str] = None
    media: Dict[str, List[Dict]] = {}
    links: Dict[str, List[Dict]] = {}
    downloaded_files: Optional[List[str]] = None
    js_execution_result: Optional[Dict[str, Any]] = None
    screenshot: Optional[str] = None
    pdf: Optional[bytes] = None
    mhtml: Optional[str] = None # Added mhtml based on the provided models.py
    _markdown: Optional[MarkdownGenerationResult] = PrivateAttr(default=None)
    extracted_content: Optional[str] = None
    metadata: Optional[dict] = None
    error_message: Optional[str] = None
    session_id: Optional[str] = None
    response_headers: Optional[dict] = None
    status_code: Optional[int] = None
    ssl_certificate: Optional[SSLCertificate] = None
    dispatch_result: Optional[DispatchResult] = None
    redirected_url: Optional[str] = None
    # NEW: Fields for captured data
    network_requests: Optional[List[Dict[str, Any]]] = None
    console_messages: Optional[List[Dict[str, Any]]] = None

    class Config:
        arbitrary_types_allowed = True

    # ... (Existing __init__, properties, model_dump for markdown compatibility) ...

# ... (Rest of the models) ...
```

**3. Crawler Strategy (`crawl4ai/async_crawler_strategy.py`)**

*   **Goal:** Implement the actual capturing logic within `AsyncPlaywrightCrawlerStrategy._crawl_web`.
*   **Changes:**
    *   Inside `_crawl_web`, initialize empty lists `captured_requests = []` and `captured_console = []`.
    *   Conditionally attach Playwright event listeners (`page.on(...)`) based on the `config.capture_network_requests` and `config.capture_console_messages` flags.
    *   Define handler functions for these listeners to extract relevant data and append it to the respective lists. Include timestamps.
    *   Pass the captured lists to the `AsyncCrawlResponse` constructor at the end of the method.

```python
# ==== File: crawl4ai/async_crawler_strategy.py ====
# ... (imports) ...
import time # Make sure time is imported

class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
    # ... (existing methods like __init__, start, close, etc.) ...

    async def _crawl_web(
        self, url: str, config: CrawlerRunConfig
    ) -> AsyncCrawlResponse:
        """
        Internal method to crawl web URLs with the specified configuration.
        Includes optional network and console capturing. # MODIFIED DOCSTRING
        """
        config.url = url
        response_headers = {}
        execution_result = None
        status_code = None
        redirected_url = url

        # Reset downloaded files list for new crawl
        self._downloaded_files = []

        # Initialize capture lists - IMPORTANT: Reset per crawl
        captured_requests: List[Dict[str, Any]] = []
        captured_console: List[Dict[str, Any]] = []

        # Handle user agent ... (existing code) ...

        # Get page for session
        page, context = await self.browser_manager.get_page(crawlerRunConfig=config)

        # ... (existing code for cookies, navigator overrides, hooks) ...

        # --- Setup Capturing Listeners ---
        # NOTE: These listeners are attached *before* page.goto()

        # Network Request Capturing
        if config.capture_network_requests:
            async def handle_request_capture(request):
                try:
                    post_data_str = None
                    try:
                        # Be cautious with large post data
                        post_data = request.post_data_buffer
                        if post_data:
                             # Attempt to decode, fallback to base64 or size indication
                             try:
                                 post_data_str = post_data.decode('utf-8', errors='replace')
                             except UnicodeDecodeError:
                                 post_data_str = f"[Binary data: {len(post_data)} bytes]"
                    except Exception:
                        post_data_str = "[Error retrieving post data]"

                    captured_requests.append({
                        "event_type": "request",
                        "url": request.url,
                        "method": request.method,
                        "headers": dict(request.headers), # Convert Header dict
                        "post_data": post_data_str,
                        "resource_type": request.resource_type,
                        "is_navigation_request": request.is_navigation_request(),
                        "timestamp": time.time()
                    })
                except Exception as e:
                    self.logger.warning(f"Error capturing request details for {request.url}: {e}", tag="CAPTURE")
                    captured_requests.append({"event_type": "request_capture_error", "url": request.url, "error": str(e), "timestamp": time.time()})

            async def handle_response_capture(response):
                try:
                    # Avoid capturing full response body by default due to size/security
                    # security_details = await response.security_details() # Optional: More SSL info
                    captured_requests.append({
                        "event_type": "response",
                        "url": response.url,
                        "status": response.status,
                        "status_text": response.status_text,
                        "headers": dict(response.headers), # Convert Header dict
                        "from_service_worker": response.from_service_worker,
                        # "security_details": security_details, # Uncomment if needed
                        "request_timing": response.request.timing, # Detailed timing info
                        "timestamp": time.time()
                    })
                except Exception as e:
                    self.logger.warning(f"Error capturing response details for {response.url}: {e}", tag="CAPTURE")
                    captured_requests.append({"event_type": "response_capture_error", "url": response.url, "error": str(e), "timestamp": time.time()})

            async def handle_request_failed_capture(request):
                 try:
                    captured_requests.append({
                        "event_type": "request_failed",
                        "url": request.url,
                        "method": request.method,
                        "resource_type": request.resource_type,
                        "failure_text": request.failure.error_text if request.failure else "Unknown failure",
                        "timestamp": time.time()
                    })
                 except Exception as e:
                    self.logger.warning(f"Error capturing request failed details for {request.url}: {e}", tag="CAPTURE")
                    captured_requests.append({"event_type": "request_failed_capture_error", "url": request.url, "error": str(e), "timestamp": time.time()})

            page.on("request", handle_request_capture)
            page.on("response", handle_response_capture)
            page.on("requestfailed", handle_request_failed_capture)

        # Console Message Capturing
        if config.capture_console_messages:
            def handle_console_capture(msg):
                 try:
                    location = msg.location()
                    # Attempt to resolve JSHandle args to primitive values
                    resolved_args = []
                    try:
                        for arg in msg.args:
                            resolved_args.append(arg.json_value()) # May fail for complex objects
                    except Exception:
                         resolved_args.append("[Could not resolve JSHandle args]")

                    captured_console.append({
                        "type": msg.type(), # e.g., 'log', 'error', 'warning'
                        "text": msg.text(),
                        "args": resolved_args, # Captured arguments
                        "location": f"{location['url']}:{location['lineNumber']}:{location['columnNumber']}" if location else "N/A",
                        "timestamp": time.time()
                    })
                 except Exception as e:
                    self.logger.warning(f"Error capturing console message: {e}", tag="CAPTURE")
                    captured_console.append({"type": "console_capture_error", "error": str(e), "timestamp": time.time()})

            def handle_pageerror_capture(err):
                 try:
                    captured_console.append({
                        "type": "error", # Consistent type for page errors
                        "text": err.message,
                        "stack": err.stack,
                        "timestamp": time.time()
                    })
                 except Exception as e:
                    self.logger.warning(f"Error capturing page error: {e}", tag="CAPTURE")
                    captured_console.append({"type": "pageerror_capture_error", "error": str(e), "timestamp": time.time()})

            page.on("console", handle_console_capture)
            page.on("pageerror", handle_pageerror_capture)
        # --- End Setup Capturing Listeners ---


        # Set up console logging if requested (Keep original logging logic separate or merge carefully)
        if config.log_console:
            # ... (original log_console setup using page.on(...) remains here) ...
            # This allows logging to screen *and* capturing to the list if both flags are True
            def log_consol(msg, console_log_type="debug"):
                # ... existing implementation ...
                pass # Placeholder for existing code

            page.on("console", lambda msg: log_consol(msg, "debug"))
            page.on("pageerror", lambda e: log_consol(e, "error"))


        try:
            # ... (existing code for SSL, downloads, goto, waits, JS execution, etc.) ...

            # Get final HTML content
            # ... (existing code for selector logic or page.content()) ...
            if config.css_selector:
                # ... existing selector logic ...
                html = f"<div class='crawl4ai-result'>\n" + "\n".join(html_parts) + "\n</div>"
            else:
                html = await page.content()

            await self.execute_hook(
                "before_return_html", page=page, html=html, context=context, config=config
            )

            # Handle PDF and screenshot generation
            # ... (existing code) ...

            # Define delayed content getter
            # ... (existing code) ...

            # Return complete response - ADD CAPTURED DATA HERE
            return AsyncCrawlResponse(
                html=html,
                response_headers=response_headers,
                js_execution_result=execution_result,
                status_code=status_code,
                screenshot=screenshot_data,
                pdf_data=pdf_data,
                get_delayed_content=get_delayed_content,
                ssl_certificate=ssl_cert,
                downloaded_files=(
                    self._downloaded_files if self._downloaded_files else None
                ),
                redirected_url=redirected_url,
                # NEW: Pass captured data conditionally
                network_requests=captured_requests if config.capture_network_requests else None,
                console_messages=captured_console if config.capture_console_messages else None,
            )

        except Exception as e:
            raise e # Re-raise the original exception

        finally:
            # If no session_id is given we should close the page
            if not config.session_id:
                # Detach listeners before closing to prevent potential errors during close
                if config.capture_network_requests:
                    page.remove_listener("request", handle_request_capture)
                    page.remove_listener("response", handle_response_capture)
                    page.remove_listener("requestfailed", handle_request_failed_capture)
                if config.capture_console_messages:
                    page.remove_listener("console", handle_console_capture)
                    page.remove_listener("pageerror", handle_pageerror_capture)
                # Also remove logging listeners if they were attached
                if config.log_console:
                    # Need to figure out how to remove the lambdas if necessary,
                    # or ensure they don't cause issues on close. Often, it's fine.
                    pass

                await page.close()

    # ... (rest of AsyncPlaywrightCrawlerStrategy methods) ...

```

**4. Core Crawler (`crawl4ai/async_webcrawler.py`)**

*   **Goal:** Ensure the captured data from `AsyncCrawlResponse` is transferred to the final `CrawlResult`.
*   **Changes:**
    *   In `arun`, when processing a non-cached result (inside the `if not cached_result or not html:` block), after receiving `async_response` and calling `aprocess_html` to get `crawl_result`, copy the `network_requests` and `console_messages` from `async_response` to `crawl_result`.

```python
# ==== File: crawl4ai/async_webcrawler.py ====
# ... (imports) ...

class AsyncWebCrawler:
    # ... (existing methods) ...

    async def arun(
        self,
        url: str,
        config: CrawlerRunConfig = None,
        **kwargs,
    ) -> RunManyReturn:
        # ... (existing setup, cache check) ...

        async with self._lock or self.nullcontext():
            try:
                # ... (existing logging, cache context setup) ...

                if cached_result:
                    # ... (existing cache handling logic) ...
                    # Note: Captured network/console usually not useful from cache
                    # Ensure they are None or empty if read from cache, unless stored explicitly
                    cached_result.network_requests = cached_result.network_requests or None
                    cached_result.console_messages = cached_result.console_messages or None
                    # ... (rest of cache logic) ...

                # Fetch fresh content if needed
                if not cached_result or not html:
                    t1 = time.perf_counter()

                    # ... (existing user agent update, robots.txt check) ...

                    ##############################
                    # Call CrawlerStrategy.crawl #
                    ##############################
                    async_response = await self.crawler_strategy.crawl(
                        url,
                        config=config,
                    )

                    # ... (existing assignment of html, screenshot, pdf, js_result from async_response) ...

                    t2 = time.perf_counter()
                    # ... (existing logging) ...

                    ###############################################################
                    # Process the HTML content, Call CrawlerStrategy.process_html #
                    ###############################################################
                    crawl_result: CrawlResult = await self.aprocess_html(
                        # ... (existing args) ...
                    )

                    # --- Transfer data from AsyncCrawlResponse to CrawlResult ---
                    crawl_result.status_code = async_response.status_code
                    crawl_result.redirected_url = async_response.redirected_url or url
                    crawl_result.response_headers = async_response.response_headers
                    crawl_result.downloaded_files = async_response.downloaded_files
                    crawl_result.js_execution_result = js_execution_result
                    crawl_result.ssl_certificate = async_response.ssl_certificate
                    # NEW: Copy captured data
                    crawl_result.network_requests = async_response.network_requests
                    crawl_result.console_messages = async_response.console_messages
                    # ------------------------------------------------------------

                    crawl_result.success = bool(html)
                    crawl_result.session_id = getattr(config, "session_id", None)

                    # ... (existing logging) ...

                    # Update cache if appropriate
                    if cache_context.should_write() and not bool(cached_result):
                        # crawl_result now includes network/console data if captured
                        await async_db_manager.acache_url(crawl_result)

                    return CrawlResultContainer(crawl_result)

                else: # Cached result was used
                     # ... (existing logging for cache hit) ...
                    cached_result.success = bool(html)
                    cached_result.session_id = getattr(config, "session_id", None)
                    cached_result.redirected_url = cached_result.redirected_url or url
                    return CrawlResultContainer(cached_result)

            except Exception as e:
                # ... (existing error handling) ...
                return CrawlResultContainer(
                    CrawlResult(
                        url=url, html="", success=False, error_message=error_message
                    )
                )

    # ... (aprocess_html remains unchanged regarding capture) ...

    # ... (arun_many remains unchanged regarding capture) ...
```

**Summary of Changes:**

1.  **Configuration:** Added `capture_network_requests` and `capture_console_messages` flags to `CrawlerRunConfig`.
2.  **Models:** Added corresponding `network_requests` and `console_messages` fields (List of Dicts) to `AsyncCrawlResponse` and `CrawlResult`.
3.  **Strategy:** Implemented conditional event listeners in `AsyncPlaywrightCrawlerStrategy._crawl_web` to capture data into lists when flags are true. Populated these fields in the returned `AsyncCrawlResponse`. Added basic error handling within capture handlers. Added timestamps.
4.  **Crawler:** Modified `AsyncWebCrawler.arun` to copy the captured data from `AsyncCrawlResponse` into the final `CrawlResult` for non-cached fetches.

This approach keeps the capturing logic contained within the Playwright strategy, uses clear configuration flags, and integrates the results into the existing data flow. The data format (list of dictionaries) is flexible for storing varied information from requests/responses/console messages.
feat(crawler): add network request and console message capturing Implement comprehensive network request and console message capturing functionality: - Add capture_network_requests and capture_console_messages config parameters - Add network_requests and console_messages fields to models - Implement Playwright event listeners to capture requests, responses, and console output - Create detailed documentation and examples - Add comprehensive tests This feature enables deep visibility into web page activity for debugging, security analysis, performance profiling, and API discovery in web applications. 2025-04-10 16:03:48 +08:00			I want to enhance the `AsyncPlaywrightCrawlerStrategy` to optionally capture network requests and console messages during a crawl, storing them in the final `CrawlResult`.

			`Here's a breakdown of the proposed changes across the relevant files:`

			1. Configuration (`crawl4ai/async_configs.py`)

			* Goal: Add flags to `CrawlerRunConfig` to enable/disable capturing.
			`* Changes:`
			* Add two new boolean attributes to `CrawlerRunConfig`:
			* `capture_network_requests: bool = False`
			* `capture_console_messages: bool = False`
			* Update `__init__`, `from_kwargs`, `to_dict`, and implicitly `clone`/`dump`/`load` to include these new attributes.

			```python
			`# ==== File: crawl4ai/async_configs.py ====`
			`# ... (imports) ...`

			`class CrawlerRunConfig():`
			`# ... (existing attributes) ...`

			`# NEW: Network and Console Capturing Parameters`
			`capture_network_requests: bool = False`
			`capture_console_messages: bool = False`

			`# Experimental Parameters`
			`experimental: Dict[str, Any] = None,`

			`def __init__(`
			`self,`
			`# ... (existing parameters) ...`

			`# NEW: Network and Console Capturing Parameters`
			`capture_network_requests: bool = False,`
			`capture_console_messages: bool = False,`

			`# Experimental Parameters`
			`experimental: Dict[str, Any] = None,`
			`):`
			`# ... (existing assignments) ...`

			`# NEW: Assign new parameters`
			`self.capture_network_requests = capture_network_requests`
			`self.capture_console_messages = capture_console_messages`

			`# Experimental Parameters`
			`self.experimental = experimental or {}`

			`# ... (rest of __init__) ...`

			`@staticmethod`
			`def from_kwargs(kwargs: dict) -> "CrawlerRunConfig":`
			`return CrawlerRunConfig(`
			`# ... (existing kwargs gets) ...`

			`# NEW: Get new parameters`
			`capture_network_requests=kwargs.get("capture_network_requests", False),`
			`capture_console_messages=kwargs.get("capture_console_messages", False),`

			`# Experimental Parameters`
			`experimental=kwargs.get("experimental"),`
			`)`

			`def to_dict(self):`
			`return {`
			`# ... (existing dict entries) ...`

			`# NEW: Add new parameters to dict`
			`"capture_network_requests": self.capture_network_requests,`
			`"capture_console_messages": self.capture_console_messages,`

			`"experimental": self.experimental,`
			`}`

			`# clone(), dump(), load() should work automatically if they rely on to_dict() and from_kwargs()`
			`# or the serialization logic correctly handles all attributes.`
			```

			2. Data Models (`crawl4ai/models.py`)

			`* Goal: Add fields to store the captured data in the response/result objects.`
			`* Changes:`
			* Add `network_requests: Optional[List[Dict[str, Any]]] = None` and `console_messages: Optional[List[Dict[str, Any]]] = None` to `AsyncCrawlResponse`.
			* Add the same fields to `CrawlResult`.

			```python
			`# ==== File: crawl4ai/models.py ====`
			`# ... (imports) ...`

			`# ... (Existing dataclasses/models) ...`

			`class AsyncCrawlResponse(BaseModel):`
			`html: str`
			`response_headers: Dict[str, str]`
			`js_execution_result: Optional[Dict[str, Any]] = None`
			`status_code: int`
			`screenshot: Optional[str] = None`
			`pdf_data: Optional[bytes] = None`
			`get_delayed_content: Optional[Callable[[Optional[float]], Awaitable[str]]] = None`
			`downloaded_files: Optional[List[str]] = None`
			`ssl_certificate: Optional[SSLCertificate] = None`
			`redirected_url: Optional[str] = None`
			`# NEW: Fields for captured data`
			`network_requests: Optional[List[Dict[str, Any]]] = None`
			`console_messages: Optional[List[Dict[str, Any]]] = None`

			`class Config:`
			`arbitrary_types_allowed = True`

			`# ... (Existing models like MediaItem, Link, etc.) ...`

			`class CrawlResult(BaseModel):`
			`url: str`
			`html: str`
			`success: bool`
			`cleaned_html: Optional[str] = None`
			`media: Dict[str, List[Dict]] = {}`
			`links: Dict[str, List[Dict]] = {}`
			`downloaded_files: Optional[List[str]] = None`
			`js_execution_result: Optional[Dict[str, Any]] = None`
			`screenshot: Optional[str] = None`
			`pdf: Optional[bytes] = None`
			`mhtml: Optional[str] = None # Added mhtml based on the provided models.py`
			`_markdown: Optional[MarkdownGenerationResult] = PrivateAttr(default=None)`
			`extracted_content: Optional[str] = None`
			`metadata: Optional[dict] = None`
			`error_message: Optional[str] = None`
			`session_id: Optional[str] = None`
			`response_headers: Optional[dict] = None`
			`status_code: Optional[int] = None`
			`ssl_certificate: Optional[SSLCertificate] = None`
			`dispatch_result: Optional[DispatchResult] = None`
			`redirected_url: Optional[str] = None`
			`# NEW: Fields for captured data`
			`network_requests: Optional[List[Dict[str, Any]]] = None`
			`console_messages: Optional[List[Dict[str, Any]]] = None`

			`class Config:`
			`arbitrary_types_allowed = True`

			`# ... (Existing __init__, properties, model_dump for markdown compatibility) ...`

			`# ... (Rest of the models) ...`
			```

			3. Crawler Strategy (`crawl4ai/async_crawler_strategy.py`)

			* Goal: Implement the actual capturing logic within `AsyncPlaywrightCrawlerStrategy._crawl_web`.
			`* Changes:`
			* Inside `_crawl_web`, initialize empty lists `captured_requests = []` and `captured_console = []`.
			* Conditionally attach Playwright event listeners (`page.on(...)`) based on the `config.capture_network_requests` and `config.capture_console_messages` flags.
			`* Define handler functions for these listeners to extract relevant data and append it to the respective lists. Include timestamps.`
			* Pass the captured lists to the `AsyncCrawlResponse` constructor at the end of the method.

			```python
			`# ==== File: crawl4ai/async_crawler_strategy.py ====`
			`# ... (imports) ...`
			`import time # Make sure time is imported`

			`class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):`
			`# ... (existing methods like __init__, start, close, etc.) ...`

			`async def _crawl_web(`
			`self, url: str, config: CrawlerRunConfig`
			`) -> AsyncCrawlResponse:`
			`"""`
			`Internal method to crawl web URLs with the specified configuration.`
			`Includes optional network and console capturing. # MODIFIED DOCSTRING`
			`"""`
			`config.url = url`
			`response_headers = {}`
			`execution_result = None`
			`status_code = None`
			`redirected_url = url`

			`# Reset downloaded files list for new crawl`
			`self._downloaded_files = []`

			`# Initialize capture lists - IMPORTANT: Reset per crawl`
			`captured_requests: List[Dict[str, Any]] = []`
			`captured_console: List[Dict[str, Any]] = []`

			`# Handle user agent ... (existing code) ...`

			`# Get page for session`
			`page, context = await self.browser_manager.get_page(crawlerRunConfig=config)`

			`# ... (existing code for cookies, navigator overrides, hooks) ...`

			`# --- Setup Capturing Listeners ---`
			`# NOTE: These listeners are attached before page.goto()`

			`# Network Request Capturing`
			`if config.capture_network_requests:`
			`async def handle_request_capture(request):`
			`try:`
			`post_data_str = None`
			`try:`
			`# Be cautious with large post data`
			`post_data = request.post_data_buffer`
			`if post_data:`
			`# Attempt to decode, fallback to base64 or size indication`
			`try:`
			`post_data_str = post_data.decode('utf-8', errors='replace')`
			`except UnicodeDecodeError:`
			`post_data_str = f"[Binary data: {len(post_data)} bytes]"`
			`except Exception:`
			`post_data_str = "[Error retrieving post data]"`

			`captured_requests.append({`
			`"event_type": "request",`
			`"url": request.url,`
			`"method": request.method,`
			`"headers": dict(request.headers), # Convert Header dict`
			`"post_data": post_data_str,`
			`"resource_type": request.resource_type,`
			`"is_navigation_request": request.is_navigation_request(),`
			`"timestamp": time.time()`
			`})`
			`except Exception as e:`
			`self.logger.warning(f"Error capturing request details for {request.url}: {e}", tag="CAPTURE")`
			`captured_requests.append({"event_type": "request_capture_error", "url": request.url, "error": str(e), "timestamp": time.time()})`

			`async def handle_response_capture(response):`
			`try:`
			`# Avoid capturing full response body by default due to size/security`
			`# security_details = await response.security_details() # Optional: More SSL info`
			`captured_requests.append({`
			`"event_type": "response",`
			`"url": response.url,`
			`"status": response.status,`
			`"status_text": response.status_text,`
			`"headers": dict(response.headers), # Convert Header dict`
			`"from_service_worker": response.from_service_worker,`
			`# "security_details": security_details, # Uncomment if needed`
			`"request_timing": response.request.timing, # Detailed timing info`
			`"timestamp": time.time()`
			`})`
			`except Exception as e:`
			`self.logger.warning(f"Error capturing response details for {response.url}: {e}", tag="CAPTURE")`
			`captured_requests.append({"event_type": "response_capture_error", "url": response.url, "error": str(e), "timestamp": time.time()})`

			`async def handle_request_failed_capture(request):`
			`try:`
			`captured_requests.append({`
			`"event_type": "request_failed",`
			`"url": request.url,`
			`"method": request.method,`
			`"resource_type": request.resource_type,`
			`"failure_text": request.failure.error_text if request.failure else "Unknown failure",`
			`"timestamp": time.time()`
			`})`
			`except Exception as e:`
			`self.logger.warning(f"Error capturing request failed details for {request.url}: {e}", tag="CAPTURE")`
			`captured_requests.append({"event_type": "request_failed_capture_error", "url": request.url, "error": str(e), "timestamp": time.time()})`

			`page.on("request", handle_request_capture)`
			`page.on("response", handle_response_capture)`
			`page.on("requestfailed", handle_request_failed_capture)`

			`# Console Message Capturing`
			`if config.capture_console_messages:`
			`def handle_console_capture(msg):`
			`try:`
			`location = msg.location()`
			`# Attempt to resolve JSHandle args to primitive values`
			`resolved_args = []`
			`try:`
			`for arg in msg.args:`
			`resolved_args.append(arg.json_value()) # May fail for complex objects`
			`except Exception:`
			`resolved_args.append("[Could not resolve JSHandle args]")`

			`captured_console.append({`
			`"type": msg.type(), # e.g., 'log', 'error', 'warning'`
			`"text": msg.text(),`
			`"args": resolved_args, # Captured arguments`
			`"location": f"{location['url']}:{location['lineNumber']}:{location['columnNumber']}" if location else "N/A",`
			`"timestamp": time.time()`
			`})`
			`except Exception as e:`
			`self.logger.warning(f"Error capturing console message: {e}", tag="CAPTURE")`
			`captured_console.append({"type": "console_capture_error", "error": str(e), "timestamp": time.time()})`

			`def handle_pageerror_capture(err):`
			`try:`
			`captured_console.append({`
			`"type": "error", # Consistent type for page errors`
			`"text": err.message,`
			`"stack": err.stack,`
			`"timestamp": time.time()`
			`})`
			`except Exception as e:`
			`self.logger.warning(f"Error capturing page error: {e}", tag="CAPTURE")`
			`captured_console.append({"type": "pageerror_capture_error", "error": str(e), "timestamp": time.time()})`

			`page.on("console", handle_console_capture)`
			`page.on("pageerror", handle_pageerror_capture)`
			`# --- End Setup Capturing Listeners ---`


			`# Set up console logging if requested (Keep original logging logic separate or merge carefully)`
			`if config.log_console:`
			`# ... (original log_console setup using page.on(...) remains here) ...`
			`# This allows logging to screen and capturing to the list if both flags are True`
			`def log_consol(msg, console_log_type="debug"):`
			`# ... existing implementation ...`
			`pass # Placeholder for existing code`

			`page.on("console", lambda msg: log_consol(msg, "debug"))`
			`page.on("pageerror", lambda e: log_consol(e, "error"))`


			`try:`
			`# ... (existing code for SSL, downloads, goto, waits, JS execution, etc.) ...`

			`# Get final HTML content`
			`# ... (existing code for selector logic or page.content()) ...`
			`if config.css_selector:`
			`# ... existing selector logic ...`
			`html = f"<div class='crawl4ai-result'>\n" + "\n".join(html_parts) + "\n</div>"`
			`else:`
			`html = await page.content()`

			`await self.execute_hook(`
			`"before_return_html", page=page, html=html, context=context, config=config`
			`)`

			`# Handle PDF and screenshot generation`
			`# ... (existing code) ...`

			`# Define delayed content getter`
			`# ... (existing code) ...`

			`# Return complete response - ADD CAPTURED DATA HERE`
			`return AsyncCrawlResponse(`
			`html=html,`
			`response_headers=response_headers,`
			`js_execution_result=execution_result,`
			`status_code=status_code,`
			`screenshot=screenshot_data,`
			`pdf_data=pdf_data,`
			`get_delayed_content=get_delayed_content,`
			`ssl_certificate=ssl_cert,`
			`downloaded_files=(`
			`self._downloaded_files if self._downloaded_files else None`
			`),`
			`redirected_url=redirected_url,`
			`# NEW: Pass captured data conditionally`
			`network_requests=captured_requests if config.capture_network_requests else None,`
			`console_messages=captured_console if config.capture_console_messages else None,`
			`)`

			`except Exception as e:`
			`raise e # Re-raise the original exception`

			`finally:`
			`# If no session_id is given we should close the page`
			`if not config.session_id:`
			`# Detach listeners before closing to prevent potential errors during close`
			`if config.capture_network_requests:`
			`page.remove_listener("request", handle_request_capture)`
			`page.remove_listener("response", handle_response_capture)`
			`page.remove_listener("requestfailed", handle_request_failed_capture)`
			`if config.capture_console_messages:`
			`page.remove_listener("console", handle_console_capture)`
			`page.remove_listener("pageerror", handle_pageerror_capture)`
			`# Also remove logging listeners if they were attached`
			`if config.log_console:`
			`# Need to figure out how to remove the lambdas if necessary,`
			`# or ensure they don't cause issues on close. Often, it's fine.`
			`pass`

			`await page.close()`

			`# ... (rest of AsyncPlaywrightCrawlerStrategy methods) ...`

			```

			4. Core Crawler (`crawl4ai/async_webcrawler.py`)

			* Goal: Ensure the captured data from `AsyncCrawlResponse` is transferred to the final `CrawlResult`.
			`* Changes:`
			* In `arun`, when processing a non-cached result (inside the `if not cached_result or not html:` block), after receiving `async_response` and calling `aprocess_html` to get `crawl_result`, copy the `network_requests` and `console_messages` from `async_response` to `crawl_result`.

			```python
			`# ==== File: crawl4ai/async_webcrawler.py ====`
			`# ... (imports) ...`

			`class AsyncWebCrawler:`
			`# ... (existing methods) ...`

			`async def arun(`
			`self,`
			`url: str,`
			`config: CrawlerRunConfig = None,`
			`**kwargs,`
			`) -> RunManyReturn:`
			`# ... (existing setup, cache check) ...`

			`async with self._lock or self.nullcontext():`
			`try:`
			`# ... (existing logging, cache context setup) ...`

			`if cached_result:`
			`# ... (existing cache handling logic) ...`
			`# Note: Captured network/console usually not useful from cache`
			`# Ensure they are None or empty if read from cache, unless stored explicitly`
			`cached_result.network_requests = cached_result.network_requests or None`
			`cached_result.console_messages = cached_result.console_messages or None`
			`# ... (rest of cache logic) ...`

			`# Fetch fresh content if needed`
			`if not cached_result or not html:`
			`t1 = time.perf_counter()`

			`# ... (existing user agent update, robots.txt check) ...`

			`##############################`
			`# Call CrawlerStrategy.crawl #`
			`##############################`
			`async_response = await self.crawler_strategy.crawl(`
			`url,`
			`config=config,`
			`)`

			`# ... (existing assignment of html, screenshot, pdf, js_result from async_response) ...`

			`t2 = time.perf_counter()`
			`# ... (existing logging) ...`

			`###############################################################`
			`# Process the HTML content, Call CrawlerStrategy.process_html #`
			`###############################################################`
			`crawl_result: CrawlResult = await self.aprocess_html(`
			`# ... (existing args) ...`
			`)`

			`# --- Transfer data from AsyncCrawlResponse to CrawlResult ---`
			`crawl_result.status_code = async_response.status_code`
			`crawl_result.redirected_url = async_response.redirected_url or url`
			`crawl_result.response_headers = async_response.response_headers`
			`crawl_result.downloaded_files = async_response.downloaded_files`
			`crawl_result.js_execution_result = js_execution_result`
			`crawl_result.ssl_certificate = async_response.ssl_certificate`
			`# NEW: Copy captured data`
			`crawl_result.network_requests = async_response.network_requests`
			`crawl_result.console_messages = async_response.console_messages`
			`# ------------------------------------------------------------`

			`crawl_result.success = bool(html)`
			`crawl_result.session_id = getattr(config, "session_id", None)`

			`# ... (existing logging) ...`

			`# Update cache if appropriate`
			`if cache_context.should_write() and not bool(cached_result):`
			`# crawl_result now includes network/console data if captured`
			`await async_db_manager.acache_url(crawl_result)`

			`return CrawlResultContainer(crawl_result)`

			`else: # Cached result was used`
			`# ... (existing logging for cache hit) ...`
			`cached_result.success = bool(html)`
			`cached_result.session_id = getattr(config, "session_id", None)`
			`cached_result.redirected_url = cached_result.redirected_url or url`
			`return CrawlResultContainer(cached_result)`

			`except Exception as e:`
			`# ... (existing error handling) ...`
			`return CrawlResultContainer(`
			`CrawlResult(`
			`url=url, html="", success=False, error_message=error_message`
			`)`
			`)`

			`# ... (aprocess_html remains unchanged regarding capture) ...`

			`# ... (arun_many remains unchanged regarding capture) ...`
			```

			`Summary of Changes:`

			1. Configuration: Added `capture_network_requests` and `capture_console_messages` flags to `CrawlerRunConfig`.
			2. Models: Added corresponding `network_requests` and `console_messages` fields (List of Dicts) to `AsyncCrawlResponse` and `CrawlResult`.
			3. Strategy: Implemented conditional event listeners in `AsyncPlaywrightCrawlerStrategy._crawl_web` to capture data into lists when flags are true. Populated these fields in the returned `AsyncCrawlResponse`. Added basic error handling within capture handlers. Added timestamps.
			4. Crawler: Modified `AsyncWebCrawler.arun` to copy the captured data from `AsyncCrawlResponse` into the final `CrawlResult` for non-cached fetches.

			`This approach keeps the capturing logic contained within the Playwright strategy, uses clear configuration flags, and integrates the results into the existing data flow. The data format (list of dictionaries) is flexible for storing varied information from requests/responses/console messages.`