feat(docker): update Docker deployment for v0.6.0

Major updates to Docker deployment infrastructure: - Switch default port to 11235 for all services - Add MCP (Model Context Protocol) support with WebSocket/SSE endpoints - Simplify docker-compose.yml with auto-platform detection - Update documentation with new features and examples - Consolidate configuration and improve resource management BREAKING CHANGE: Default port changed from 8020 to 11235. Update your configurations and deployment scripts accordingly.
2025-04-22 22:35:25 +08:00 · 2025-04-22 22:35:25 +08:00 · 4812f08a73
commit 4812f08a73
parent f3ebb38edf
14 changed files with 726 additions and 1303 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -5,6 +5,53 @@ All notable changes to Crawl4AI will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [0.6.0rc1‑r1] ‑ 2025‑04‑22
 ### Added
 - Browser pooling with page pre‑warming and fine‑grained **geolocation, locale, and timezone** controls  
 - Crawler pool manager (SDK + Docker API) for smarter resource allocation  
 - Network & console log capture plus MHTML snapshot export  
 - **Table extractor**: turn HTML `<table>`s into DataFrames or CSV with one flag  
 - High‑volume stress‑test framework in `tests/memory` and API load scripts  
 - MCP protocol endpoints with socket & SSE support; playground UI scaffold  
 - Docs v2 revamp: TOC, GitHub badge, copy‑code buttons, Docker API demo  
 - “Ask AI” helper button *(work‑in‑progress, shipping soon)*  
 - New examples: geo‑location usage, network/console capture, Docker API, markdown source selection, crypto analysis  
 - Expanded automated test suites for browser, Docker, MCP and memory benchmarks  
 ### Changed
 - Consolidated and renamed browser strategies; legacy docker strategy modules removed  
 - `ProxyConfig` moved to `async_configs`  
 - Server migrated to pool‑based crawler management  
 - FastAPI validators replace custom query validation  
 - Docker build now uses Chromium base image  
 - Large‑scale repo tidy‑up (≈36 k insertions, ≈5 k deletions)  
 ### Fixed
 - Async crawler session leak, duplicate‑visit handling, URL normalisation  
 - Target‑element regressions in scraping strategies  
 - Logged‑URL readability, encoded‑URL decoding, middle truncation for long URLs  
 - Closed issues: #701, #733, #756, #774, #804, #822, #839, #841, #842, #843, #867, #902, #911  
 ### Removed
 - Obsolete modules under `crawl4ai/browser/*` superseded by the new pooled browser layer  
 ### Deprecated
 - Old markdown generator names now alias `DefaultMarkdownGenerator` and emit warnings  
 ---
 #### Upgrade notes
 1. Update any direct imports from `crawl4ai/browser/*` to the new pooled browser modules  
 2. If you override `AsyncPlaywrightCrawlerStrategy.get_page`, adopt the new signature  
 3. Rebuild Docker images to pull the new Chromium layer  
 4. Switch to `DefaultMarkdownGenerator` (or silence the deprecation warning)  
 ---
 `121 files changed, ≈36 223 insertions, ≈4 975 deletions` :contentReference[oaicite:0]{index=0}&#8203;:contentReference[oaicite:1]{index=1}
 ### [Feature] 2025-04-21
 - Implemented MCP protocol for machine-to-machine communication
  - Added WebSocket and SSE transport for MCP server
--- a/5
+++ b/5
@ -1,5 +1,10 @@
 FROM python:3.10-slim
 # C4ai version
 ARG C4AI_VER=0.6.0
 ENV C4AI_VERSION=$C4AI_VER
 LABEL c4ai.version=$C4AI_VER
 # Set build arguments
 ARG APP_HOME=/app
 ARG GITHUB_REPO=https://github.com/unclecode/crawl4ai.git
--- a/crawl4ai/version.py
+++ b/crawl4ai/version.py
@ -1,2 +1,3 @@
 # crawl4ai/_version.py
-__version__ = "0.5.0.post8"
+__version__ = "0.6.0rc1"
--- a/deploy/docker/README-new.md
+++ b/deploy/docker/README-new.md
@ -1,644 +0,0 @@
 # Crawl4AI Docker Guide 🐳
 ## Table of Contents
 - [Prerequisites](#prerequisites)
 - [Installation](#installation)
  - [Option 1: Using Docker Compose (Recommended)](#option-1-using-docker-compose-recommended)
  - [Option 2: Manual Local Build & Run](#option-2-manual-local-build--run)
  - [Option 3: Using Pre-built Docker Hub Images](#option-3-using-pre-built-docker-hub-images)
 - [Dockerfile Parameters](#dockerfile-parameters)
 - [Using the API](#using-the-api)
  - [Understanding Request Schema](#understanding-request-schema)
  - [REST API Examples](#rest-api-examples)
  - [Python SDK](#python-sdk)
 - [Metrics & Monitoring](#metrics--monitoring)
 - [Deployment Scenarios](#deployment-scenarios)
 - [Complete Examples](#complete-examples)
 - [Server Configuration](#server-configuration)
  - [Understanding config.yml](#understanding-configyml)
  - [JWT Authentication](#jwt-authentication)
  - [Configuration Tips and Best Practices](#configuration-tips-and-best-practices)
  - [Customizing Your Configuration](#customizing-your-configuration)
  - [Configuration Recommendations](#configuration-recommendations)
 - [Getting Help](#getting-help)
 ## Prerequisites
 Before we dive in, make sure you have:
 - Docker installed and running (version 20.10.0 or higher), including `docker compose` (usually bundled with Docker Desktop).
 - `git` for cloning the repository.
 - At least 4GB of RAM available for the container (more recommended for heavy use).
 - Python 3.10+ (if using the Python SDK).
 - Node.js 16+ (if using the Node.js examples).
 > 💡 **Pro tip**: Run `docker info` to check your Docker installation and available resources.
 ## Installation
 We offer several ways to get the Crawl4AI server running. Docker Compose is the easiest way to manage local builds and runs.
 ### Option 1: Using Docker Compose (Recommended)
 Docker Compose simplifies building and running the service, especially for local development and testing across different platforms.
 #### 1. Clone Repository
 ```bash
 git clone https://github.com/unclecode/crawl4ai.git
 cd crawl4ai
 ```
 #### 2. Environment Setup (API Keys)
 If you plan to use LLMs, copy the example environment file and add your API keys. This file should be in the **project root directory**.
 ```bash
 # Make sure you are in the 'crawl4ai' root directory
 cp deploy/docker/.llm.env.example .llm.env
 # Now edit .llm.env and add your API keys
 # Example content:
 # OPENAI_API_KEY=sk-your-key
 # ANTHROPIC_API_KEY=your-anthropic-key
 # ...
 ```
 > 🔑 **Note**: Keep your API keys secure! Never commit `.llm.env` to version control.
 #### 3. Build and Run with Compose
 The `docker-compose.yml` file in the project root defines services for different scenarios using **profiles**.
 *   **Build and Run Locally (AMD64):**
    ```bash
    # Builds the image locally using Dockerfile and runs it
    docker compose --profile local-amd64 up --build -d
    ```
 *   **Build and Run Locally (ARM64):**
    ```bash
    # Builds the image locally using Dockerfile and runs it
    docker compose --profile local-arm64 up --build -d
    ```
 *   **Run Pre-built Image from Docker Hub (AMD64):**
    ```bash
    # Pulls and runs the specified AMD64 image from Docker Hub
    # (Set VERSION env var for specific tags, e.g., VERSION=0.5.1-d1)
    docker compose --profile hub-amd64 up -d
    ```
 *   **Run Pre-built Image from Docker Hub (ARM64):**
    ```bash
    # Pulls and runs the specified ARM64 image from Docker Hub
    docker compose --profile hub-arm64 up -d
    ```
 > The server will be available at `http://localhost:11235`.
 #### 4. Stopping Compose Services
 ```bash
 # Stop the service(s) associated with a profile (e.g., local-amd64)
 docker compose --profile local-amd64 down
 ```
 ### Option 2: Manual Local Build & Run
 If you prefer not to use Docker Compose for local builds.
 #### 1. Clone Repository & Setup Environment
 Follow steps 1 and 2 from the Docker Compose section above (clone repo, `cd crawl4ai`, create `.llm.env` in the root).
 #### 2. Build the Image (Multi-Arch)
 Use `docker buildx` to build the image. This example builds for multiple platforms and loads the image matching your host architecture into the local Docker daemon.
 ```bash
 # Make sure you are in the 'crawl4ai' root directory
 docker buildx build --platform linux/amd64,linux/arm64 -t crawl4ai-local:latest --load .
 ```
 #### 3. Run the Container
 *   **Basic run (no LLM support):**
    ```bash
    # Replace --platform if your host is ARM64
    docker run -d \
      -p 11235:11235 \
      --name crawl4ai-standalone \
      --shm-size=1g \
      --platform linux/amd64 \
      crawl4ai-local:latest
    ```
 *   **With LLM support:**
    ```bash
    # Make sure .llm.env is in the current directory (project root)
    # Replace --platform if your host is ARM64
    docker run -d \
      -p 11235:11235 \
      --name crawl4ai-standalone \
      --env-file .llm.env \
      --shm-size=1g \
      --platform linux/amd64 \
      crawl4ai-local:latest
    ```
 > The server will be available at `http://localhost:11235`.
 #### 4. Stopping the Manual Container
 ```bash
 docker stop crawl4ai-standalone && docker rm crawl4ai-standalone
 ```
 ### Option 3: Using Pre-built Docker Hub Images
 Pull and run images directly from Docker Hub without building locally.
 #### 1. Pull the Image
 We use a versioning scheme like `LIBRARY_VERSION-dREVISION` (e.g., `0.5.1-d1`). The `latest` tag points to the most recent stable release. Images are built with multi-arch manifests, so Docker usually pulls the correct version for your system automatically.
 ```bash
 # Pull a specific version (recommended for stability)
 docker pull unclecode/crawl4ai:0.5.1-d1
 # Or pull the latest stable version
 docker pull unclecode/crawl4ai:latest
 ```
 #### 2. Setup Environment (API Keys)
 If using LLMs, create the `.llm.env` file in a directory of your choice, similar to Step 2 in the Compose section.
 #### 3. Run the Container
 *   **Basic run:**
    ```bash
    docker run -d \
      -p 11235:11235 \
      --name crawl4ai-hub \
      --shm-size=1g \
      unclecode/crawl4ai:0.5.1-d1 # Or use :latest
    ```
 *   **With LLM support:**
    ```bash
    # Make sure .llm.env is in the current directory you are running docker from
    docker run -d \
      -p 11235:11235 \
      --name crawl4ai-hub \
      --env-file .llm.env \
      --shm-size=1g \
      unclecode/crawl4ai:0.5.1-d1 # Or use :latest
    ```
 > The server will be available at `http://localhost:11235`.
 #### 4. Stopping the Hub Container
 ```bash
 docker stop crawl4ai-hub && docker rm crawl4ai-hub
 ```
 #### Docker Hub Versioning Explained
 *   **Image Name:** `unclecode/crawl4ai`
 *   **Tag Format:** `LIBRARY_VERSION-dREVISION`
    *   `LIBRARY_VERSION`: The Semantic Version of the core `crawl4ai` Python library included (e.g., `0.5.1`).
    *   `dREVISION`: An incrementing number (starting at `d1`) for Docker build changes made *without* changing the library version (e.g., base image updates, dependency fixes). Resets to `d1` for each new `LIBRARY_VERSION`.
 *   **Example:** `unclecode/crawl4ai:0.5.1-d1`
 *   **`latest` Tag:** Points to the most recent stable `LIBRARY_VERSION-dREVISION`.
 *   **Multi-Arch:** Images support `linux/amd64` and `linux/arm64`. Docker automatically selects the correct architecture.
 ---
 *(Rest of the document remains largely the same, but with key updates below)*
 ---
 ## Dockerfile Parameters
 You can customize the image build process using build arguments (`--build-arg`). These are typically used via `docker buildx build` or within the `docker-compose.yml` file.
 ```bash
 # Example: Build with 'all' features using buildx
 docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --build-arg INSTALL_TYPE=all \
  -t yourname/crawl4ai-all:latest \
  --load \
  . # Build from root context
 ```
 ### Build Arguments Explained
 | Argument     | Description                              | Default   | Options                            |
 | :----------- | :--------------------------------------- | :-------- | :--------------------------------- |
 | INSTALL_TYPE | Feature set                              | `default` | `default`, `all`, `torch`, `transformer` |
 | ENABLE_GPU   | GPU support (CUDA for AMD64)           | `false`   | `true`, `false`                    |
 | APP_HOME     | Install path inside container (advanced) | `/app`    | any valid path                   |
 | USE_LOCAL    | Install library from local source        | `true`    | `true`, `false`                    |
 | GITHUB_REPO  | Git repo to clone if USE_LOCAL=false   | *(see Dockerfile)* | any git URL                  |
 | GITHUB_BRANCH| Git branch to clone if USE_LOCAL=false   | `main`    | any branch name                  |
 *(Note: PYTHON_VERSION is fixed by the `FROM` instruction in the Dockerfile)*
 ### Build Best Practices
 1.  **Choose the Right Install Type**
    *   `default`: Basic installation, smallest image size. Suitable for most standard web scraping and markdown generation.
    *   `all`: Full features including `torch` and `transformers` for advanced extraction strategies (e.g., CosineStrategy, certain LLM filters). Significantly larger image. Ensure you need these extras.
 2.  **Platform Considerations**
    *   Use `buildx` for building multi-architecture images, especially for pushing to registries.
    *   Use `docker compose` profiles (`local-amd64`, `local-arm64`) for easy platform-specific local builds.
 3.  **Performance Optimization**
    *   The image automatically includes platform-specific optimizations (OpenMP for AMD64, OpenBLAS for ARM64).
 ---
 ## Using the API
 Communicate with the running Docker server via its REST API (defaulting to `http://localhost:11235`). You can use the Python SDK or make direct HTTP requests.
 ### Python SDK
 Install the SDK: `pip install crawl4ai`
 ```python
 import asyncio
 from crawl4ai.docker_client import Crawl4aiDockerClient
 from crawl4ai import BrowserConfig, CrawlerRunConfig, CacheMode # Assuming you have crawl4ai installed
 async def main():
    # Point to the correct server port
    async with Crawl4aiDockerClient(base_url="http://localhost:11235", verbose=True) as client:
        # If JWT is enabled on the server, authenticate first:
        # await client.authenticate("user@example.com") # See Server Configuration section
        # Example Non-streaming crawl
        print("--- Running Non-Streaming Crawl ---")
        results = await client.crawl(
            ["https://httpbin.org/html"],
            browser_config=BrowserConfig(headless=True), # Use library classes for config aid
            crawler_config=CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
        )
        if results: # client.crawl returns None on failure
          print(f"Non-streaming results success: {results.success}")
          if results.success:
              for result in results: # Iterate through the CrawlResultContainer
                  print(f"URL: {result.url}, Success: {result.success}")
        else:
            print("Non-streaming crawl failed.")
        # Example Streaming crawl
        print("\n--- Running Streaming Crawl ---")
        stream_config = CrawlerRunConfig(stream=True, cache_mode=CacheMode.BYPASS)
        try:
            async for result in await client.crawl( # client.crawl returns an async generator for streaming
                ["https://httpbin.org/html", "https://httpbin.org/links/5/0"],
                browser_config=BrowserConfig(headless=True),
                crawler_config=stream_config
            ):
                print(f"Streamed result: URL: {result.url}, Success: {result.success}")
        except Exception as e:
            print(f"Streaming crawl failed: {e}")
        # Example Get schema
        print("\n--- Getting Schema ---")
        schema = await client.get_schema()
        print(f"Schema received: {bool(schema)}") # Print whether schema was received
 if __name__ == "__main__":
    asyncio.run(main())
 ```
 *(SDK parameters like timeout, verify_ssl etc. remain the same)*
 ### Second Approach: Direct API Calls
 Crucially, when sending configurations directly via JSON, they **must** follow the `{"type": "ClassName", "params": {...}}` structure for any non-primitive value (like config objects or strategies). Dictionaries must be wrapped as `{"type": "dict", "value": {...}}`.
 *(Keep the detailed explanation of Configuration Structure, Basic Pattern, Simple vs Complex, Strategy Pattern, Complex Nested Example, Quick Grammar Overview, Important Rules, Pro Tip)*
 #### More Examples *(Ensure Schema example uses type/value wrapper)*
 **Advanced Crawler Configuration**
 *(Keep example, ensure cache_mode uses valid enum value like "bypass")*
 **Extraction Strategy**
 ```json
 {
    "crawler_config": {
        "type": "CrawlerRunConfig",
        "params": {
            "extraction_strategy": {
                "type": "JsonCssExtractionStrategy",
                "params": {
                    "schema": {
                        "type": "dict",
                        "value": {
                           "baseSelector": "article.post",
                           "fields": [
                               {"name": "title", "selector": "h1", "type": "text"},
                               {"name": "content", "selector": ".content", "type": "html"}
                           ]
                         }
                    }
                }
            }
        }
    }
 }
 ```
 **LLM Extraction Strategy** *(Keep example, ensure schema uses type/value wrapper)*
 *(Keep Deep Crawler Example)*
 ### REST API Examples
 Update URLs to use port `11235`.
 #### Simple Crawl
 ```python
 import requests
 # Configuration objects converted to the required JSON structure
 browser_config_payload = {
    "type": "BrowserConfig",
    "params": {"headless": True}
 }
 crawler_config_payload = {
    "type": "CrawlerRunConfig",
    "params": {"stream": False, "cache_mode": "bypass"} # Use string value of enum
 }
 crawl_payload = {
    "urls": ["https://httpbin.org/html"],
    "browser_config": browser_config_payload,
    "crawler_config": crawler_config_payload
 }
 response = requests.post(
    "http://localhost:11235/crawl", # Updated port
    # headers={"Authorization": f"Bearer {token}"},  # If JWT is enabled
    json=crawl_payload
 )
 print(f"Status Code: {response.status_code}")
 if response.ok:
    print(response.json())
 else:
    print(f"Error: {response.text}")
 ```
 #### Streaming Results
 ```python
 import json
 import httpx # Use httpx for async streaming example
 async def test_stream_crawl(token: str = None): # Made token optional
    """Test the /crawl/stream endpoint with multiple URLs."""
    url = "http://localhost:11235/crawl/stream" # Updated port
    payload = {
        "urls": [
            "https://httpbin.org/html",
            "https://httpbin.org/links/5/0",
        ],
        "browser_config": {
            "type": "BrowserConfig",
            "params": {"headless": True, "viewport": {"type": "dict", "value": {"width": 1200, "height": 800}}} # Viewport needs type:dict
        },
        "crawler_config": {
            "type": "CrawlerRunConfig",
            "params": {"stream": True, "cache_mode": "bypass"}
        }
    }
    headers = {}
    # if token:
    #    headers = {"Authorization": f"Bearer {token}"} # If JWT is enabled
    try:
        async with httpx.AsyncClient() as client:
            async with client.stream("POST", url, json=payload, headers=headers, timeout=120.0) as response:
                print(f"Status: {response.status_code} (Expected: 200)")
                response.raise_for_status() # Raise exception for bad status codes
                # Read streaming response line-by-line (NDJSON)
                async for line in response.aiter_lines():
                    if line:
                        try:
                            data = json.loads(line)
                            # Check for completion marker
                            if data.get("status") == "completed":
                                print("Stream completed.")
                                break
                            print(f"Streamed Result: {json.dumps(data, indent=2)}")
                        except json.JSONDecodeError:
                            print(f"Warning: Could not decode JSON line: {line}")
    except httpx.HTTPStatusError as e:
         print(f"HTTP error occurred: {e.response.status_code} - {e.response.text}")
    except Exception as e:
        print(f"Error in streaming crawl test: {str(e)}")
 # To run this example:
 # import asyncio
 # asyncio.run(test_stream_crawl())
 ```
 ---
 ## Metrics & Monitoring
 Keep an eye on your crawler with these endpoints:
 - `/health` - Quick health check
 - `/metrics` - Detailed Prometheus metrics
 - `/schema` - Full API schema
 Example health check:
 ```bash
 curl http://localhost:11235/health
 ```
 ---
 *(Deployment Scenarios and Complete Examples sections remain the same, maybe update links if examples moved)*
 ---
 ## Server Configuration
 The server's behavior can be customized through the `config.yml` file.
 ### Understanding config.yml
 The configuration file is loaded from `/app/config.yml` inside the container. By default, the file from `deploy/docker/config.yml` in the repository is copied there during the build.
 Here's a detailed breakdown of the configuration options (using defaults from `deploy/docker/config.yml`):
 ```yaml
 # Application Configuration
 app:
  title: "Crawl4AI API"
  version: "1.0.0" # Consider setting this to match library version, e.g., "0.5.1"
  host: "0.0.0.0"
  port: 8020 # NOTE: This port is used ONLY when running server.py directly. Gunicorn overrides this (see supervisord.conf).
  reload: False # Default set to False - suitable for production
  timeout_keep_alive: 300
 # Default LLM Configuration
 llm:
  provider: "openai/gpt-4o-mini"
  api_key_env: "OPENAI_API_KEY"
  # api_key: sk-...  # If you pass the API key directly then api_key_env will be ignored
 # Redis Configuration (Used by internal Redis server managed by supervisord)
 redis:
  host: "localhost"
  port: 6379
  db: 0
  password: ""
  # ... other redis options ...
 # Rate Limiting Configuration
 rate_limiting:
  enabled: True
  default_limit: "1000/minute"
  trusted_proxies: []
  storage_uri: "memory://"  # Use "redis://localhost:6379" if you need persistent/shared limits
 # Security Configuration
 security:
  enabled: false # Master toggle for security features
  jwt_enabled: false # Enable JWT authentication (requires security.enabled=true)
  https_redirect: false # Force HTTPS (requires security.enabled=true)
  trusted_hosts: ["*"] # Allowed hosts (use specific domains in production)
  headers: # Security headers (applied if security.enabled=true)
    x_content_type_options: "nosniff"
    x_frame_options: "DENY"
    content_security_policy: "default-src 'self'"
    strict_transport_security: "max-age=63072000; includeSubDomains"
 # Crawler Configuration
 crawler:
  memory_threshold_percent: 95.0
  rate_limiter:
    base_delay: [1.0, 2.0] # Min/max delay between requests in seconds for dispatcher
  timeouts:
    stream_init: 30.0  # Timeout for stream initialization
    batch_process: 300.0 # Timeout for non-streaming /crawl processing
 # Logging Configuration
 logging:
  level: "INFO"
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
 # Observability Configuration
 observability:
  prometheus:
    enabled: True
    endpoint: "/metrics"
  health_check:
    endpoint: "/health"
 ```
 *(JWT Authentication section remains the same, just note the default port is now 11235 for requests)*
 *(Configuration Tips and Best Practices remain the same)*
 ### Customizing Your Configuration
 You can override the default `config.yml`.
 #### Method 1: Modify Before Build
 1.  Edit the `deploy/docker/config.yml` file in your local repository clone.
 2.  Build the image using `docker buildx` or `docker compose --profile local-... up --build`. The modified file will be copied into the image.
 #### Method 2: Runtime Mount (Recommended for Custom Deploys)
 1.  Create your custom configuration file, e.g., `my-custom-config.yml` locally. Ensure it contains all necessary sections.
 2.  Mount it when running the container:
    *   **Using `docker run`:**
        ```bash
        # Assumes my-custom-config.yml is in the current directory
        docker run -d -p 11235:11235 \
          --name crawl4ai-custom-config \
          --env-file .llm.env \
          --shm-size=1g \
          -v $(pwd)/my-custom-config.yml:/app/config.yml \
          unclecode/crawl4ai:latest # Or your specific tag
        ```
    *   **Using `docker-compose.yml`:** Add a `volumes` section to the service definition:
        ```yaml
        services:
          crawl4ai-hub-amd64: # Or your chosen service
            image: unclecode/crawl4ai:latest
            profiles: ["hub-amd64"]
            <<: *base-config
            volumes:
              # Mount local custom config over the default one in the container
              - ./my-custom-config.yml:/app/config.yml
              # Keep the shared memory volume from base-config
              - /dev/shm:/dev/shm
        ```
        *(Note: Ensure `my-custom-config.yml` is in the same directory as `docker-compose.yml`)*
 > 💡 When mounting, your custom file *completely replaces* the default one. Ensure it's a valid and complete configuration.
 ### Configuration Recommendations
 1. **Security First** 🔒
   - Always enable security in production
   - Use specific trusted_hosts instead of wildcards
   - Set up proper rate limiting to protect your server
   - Consider your environment before enabling HTTPS redirect
 2. **Resource Management** 💻
   - Adjust memory_threshold_percent based on available RAM
   - Set timeouts according to your content size and network conditions
   - Use Redis for rate limiting in multi-container setups
 3. **Monitoring** 📊
   - Enable Prometheus if you need metrics
   - Set DEBUG logging in development, INFO in production
   - Regular health check monitoring is crucial
 4. **Performance Tuning** ⚡
   - Start with conservative rate limiter delays
   - Increase batch_process timeout for large content
   - Adjust stream_init timeout based on initial response times
 ## Getting Help
 We're here to help you succeed with Crawl4AI! Here's how to get support:
 - 📖 Check our [full documentation](https://docs.crawl4ai.com)
 - 🐛 Found a bug? [Open an issue](https://github.com/unclecode/crawl4ai/issues)
 - 💬 Join our [Discord community](https://discord.gg/crawl4ai)
 - ⭐ Star us on GitHub to show support!
 ## Summary
 In this guide, we've covered everything you need to get started with Crawl4AI's Docker deployment:
 - Building and running the Docker container
 - Configuring the environment
 - Making API requests with proper typing
 - Using the Python SDK
 - Monitoring your deployment
 Remember, the examples in the `examples` folder are your friends - they show real-world usage patterns that you can adapt for your needs.
 Keep exploring, and don't hesitate to reach out if you need help! We're building something amazing together. 🚀
 Happy crawling! 🕷️
--- a/deploy/docker/README.md
+++ b/deploy/docker/README.md
--- a/deploy/docker/config.yml
+++ b/deploy/docker/config.yml
@ -3,9 +3,9 @@ app:
  title: "Crawl4AI API"
  version: "1.0.0"
  host: "0.0.0.0"
-  port: 8020
+  port: 11235
  reload: False
-  workers: 4
+  workers: 1
  timeout_keep_alive: 300
 # Default LLM Configuration
--- a/deploy/docker/requirements.txt
+++ b/deploy/docker/requirements.txt
@ -1,5 +1,5 @@
-fastapi==0.115.12
+fastapi>=0.115.12
-uvicorn==0.34.2
+uvicorn>=0.34.2
 gunicorn>=23.0.0
 slowapi==0.1.9
 prometheus-fastapi-instrumentator>=7.1.0
@ -8,8 +8,9 @@ jwt>=1.3.1
 dnspython>=2.7.0
 email-validator==2.2.0
 sse-starlette==2.2.1
-pydantic==2.11
+pydantic>=2.11
 rank-bm25==0.2.2
 anyio==4.9.0
 PyJWT==2.10.1
-
+mcp>=1.6.0
 websockets>=15.0.1
--- a/deploy/docker/server.py
+++ b/deploy/docker/server.py
@ -629,6 +629,7 @@ async def get_context(
 # attach MCP layer (adds /mcp/ws, /mcp/sse, /mcp/schema)
 print(f"MCP server running on {config['app']['host']}:{config['app']['port']}")
 attach_mcp(
    app,
    base_url=f"http://{config['app']['host']}:{config['app']['port']}"
--- a/deploy/docker/static/playground/index.html
+++ b/deploy/docker/static/playground/index.html
@ -536,10 +536,14 @@
            const endpointMap = {
                crawl: '/crawl',
            };
            /*const endpointMap = {
                crawl: '/crawl',
                crawl_stream: '/crawl/stream',
                md: '/md',
                llm: '/llm'
-            };
+            };*/
            const api = endpointMap[endpoint];
            const payload = {
--- a/deploy/docker/supervisord.conf
+++ b/deploy/docker/supervisord.conf
@ -14,7 +14,7 @@ stderr_logfile=/dev/stderr      ; Redirect redis stderr to container stderr
 stderr_logfile_maxbytes=0
 [program:gunicorn]
-command=/usr/local/bin/gunicorn --bind 0.0.0.0:11235 --workers 2 --threads 2 --timeout 120 --graceful-timeout 30 --keep-alive 60 --log-level info --worker-class uvicorn.workers.UvicornWorker server:app
+command=/usr/local/bin/gunicorn --bind 0.0.0.0:11235 --workers 1 --threads 4 --timeout 1800 --graceful-timeout 30 --keep-alive 300 --log-level info --worker-class uvicorn.workers.UvicornWorker server:app
 directory=/app                  ; Working directory for the app
 user=appuser                    ; Run gunicorn as our non-root user
 autorestart=true
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1,19 +1,11 @@
-# docker-compose.yml
+version: '3.8'
-# Base configuration anchor for reusability
+# Shared configuration for all environments
 x-base-config: &base-config
  ports:
-    # Map host port 11235 to container port 11235 (where Gunicorn will listen)
+    - "11235:11235"  # Gunicorn port
    - "11235:11235"
    # - "8080:8080" # Uncomment if needed
  # Load API keys primarily from .llm.env file
  # Create .llm.env in the root directory .llm.env.example
  env_file:
-    - .llm.env
+    - .llm.env       # API keys (create from .llm.env.example)
  # Define environment variables, allowing overrides from host environment
  # Syntax ${VAR:-} uses host env var 'VAR' if set, otherwise uses value from .llm.env
  environment:
    - OPENAI_API_KEY=${OPENAI_API_KEY:-}
    - DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY:-}
@ -22,10 +14,8 @@ x-base-config: &base-config
    - TOGETHER_API_KEY=${TOGETHER_API_KEY:-}
    - MISTRAL_API_KEY=${MISTRAL_API_KEY:-}
    - GEMINI_API_TOKEN=${GEMINI_API_TOKEN:-}
  volumes:
-    # Mount /dev/shm for Chromium/Playwright performance
+    - /dev/shm:/dev/shm  # Chromium performance
    - /dev/shm:/dev/shm
  deploy:
    resources:
      limits:
@ -34,47 +24,26 @@ x-base-config: &base-config
        memory: 1G
  restart: unless-stopped
  healthcheck:
    # IMPORTANT: Ensure Gunicorn binds to 11235 in supervisord.conf
    test: ["CMD", "curl", "-f", "http://localhost:11235/health"]
    interval: 30s
    timeout: 10s
    retries: 3
-    start_period: 40s # Give the server time to start
+    start_period: 40s
  # Run the container as the non-root user defined in the Dockerfile
  user: "appuser"
 services:
-  # --- Local Build Services ---
+  crawl4ai:
-  crawl4ai-local-amd64:
+    # 1. Default: Pull multi-platform test image from Docker Hub
    # 2. Override with local image via: IMAGE=local-test docker compose up
    image: ${IMAGE:-unclecode/crawl4ai:${TAG:-latest}}
    # Local build config (used with --build)
    build:
-      context: . # Build context is the root directory
+      context: .
-      dockerfile: Dockerfile # Dockerfile is in the root directory
+      dockerfile: Dockerfile
      args:
        INSTALL_TYPE: ${INSTALL_TYPE:-default}
        ENABLE_GPU: ${ENABLE_GPU:-false}
-        # PYTHON_VERSION arg is omitted as it's fixed by 'FROM python:3.10-slim' in Dockerfile
+    
-    platform: linux/amd64
+    # Inherit shared config
    profiles: ["local-amd64"]
    <<: *base-config # Inherit base configuration
  crawl4ai-local-arm64:
    build:
      context: . # Build context is the root directory
      dockerfile: Dockerfile # Dockerfile is in the root directory
      args:
        INSTALL_TYPE: ${INSTALL_TYPE:-default}
        ENABLE_GPU: ${ENABLE_GPU:-false}
    platform: linux/arm64
    profiles: ["local-arm64"]
    <<: *base-config
  # --- Docker Hub Image Services ---
  crawl4ai-hub-amd64:
    image: unclecode/crawl4ai:${VERSION:-latest}-amd64
    profiles: ["hub-amd64"]
    <<: *base-config
  crawl4ai-hub-arm64:
    image: unclecode/crawl4ai:${VERSION:-latest}-arm64
    profiles: ["hub-arm64"]
    <<: *base-config
--- a/docs/md_v2/blog/releases/0.6.0.md
+++ b/docs/md_v2/blog/releases/0.6.0.md
@ -0,0 +1,51 @@
 # Crawl4AI 0.6.0
 *Release date: 2025‑04‑22*
 0.6.0 is the **biggest jump** since the 0.5 series, packing a smarter browser core, pool‑based crawlers, and a ton of DX candy. Expect faster runs, lower RAM burn, and richer diagnostics.
 ---
 ## 🚀 Key upgrades
 | Area | What changed |
 |------|--------------|
 | **Browser** | New **Browser** management with pooling, page pre‑warm, geolocation + locale + timezone switches |
 | **Crawler** | Console and network log capture, MHTML snapshots, safer `get_page` API |
 | **Server & API** | **Crawler Pool Manager** endpoint, MCP socket + SSE support |
 | **Docs** | v2 layout, floating Ask‑AI helper, GitHub stats badge, copy‑code buttons, Docker API demo |
 | **Tests** | Memory + load benchmarks, 90+ new cases covering MCP and Docker |
 ---
 ## ⚠️ Breaking changes
 1. **`get_page` signature** – returns `(html, metadata)` instead of plain html.
 2. **Docker** – new Chromium base layer, rebuild images.
 ---
 ## How to upgrade
 ```bash
 pip install -U crawl4ai==0.6.0
 ```
 ---
 ## Full changelog
 The diff between `main` and `next` spans **36 k insertions, 4.9 k deletions** over 121 files. Read the [compare view](https://github.com/unclecode/crawl4ai/compare/0.5.0.post8...0.6.0) or see `CHANGELOG.md` for the granular list.
 ---
 ## Upgrade tips
 * Using the Docker API? Pull `unclecode/crawl4ai:0.6.0`, new args are documented in `/deploy/docker/README.md`.
 * Stress‑test your stack with `tests/memory/run_benchmark.py` before production rollout.
 * Markdown generators renamed but aliased, update when convenient, warnings will remind you.
 ---
 Happy crawling, ping `@unclecode` on X for questions or memes.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -8,7 +8,7 @@ dynamic = ["version"]
 description = "🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & scraper"
 readme = "README.md"
 requires-python = ">=3.9"
-license = {text = "MIT"}
+license = {text = "Apache-2.0"}
 authors = [
    {name = "Unclecode", email = "unclecode@kidocode.com"}
 ]
--- a/tests/mcp/test_mcp_socket.py
+++ b/tests/mcp/test_mcp_socket.py
@ -101,19 +101,19 @@ async def test_context(s: ClientSession):
 async def main() -> None:
-    async with websocket_client("ws://localhost:8020/mcp/ws") as (r, w):
+    async with websocket_client("ws://localhost:11235/mcp/ws") as (r, w):
        async with ClientSession(r, w) as s:
            await s.initialize()                       # handshake
            tools = (await s.list_tools()).tools
            print("tools:", [t.name for t in tools])
            # await test_list()
-            # await test_crawl(s)
+            await test_crawl(s)
-            # await test_md(s)
+            await test_md(s)
-            # await test_screenshot(s)
+            await test_screenshot(s)
-            # await test_pdf(s)
+            await test_pdf(s)
-            # await test_execute_js(s)
+            await test_execute_js(s)
-            # await test_html(s)
+            await test_html(s)
            await test_context(s)
 anyio.run(main)
`@ -1,2 +1,3 @@`
	`# crawl4ai/_version.py`	`# crawl4ai/_version.py`
	`__version__ = "0.5.0.post8"`	`__version__ = "0.6.0rc1"`