mirror of
https://github.com/unclecode/crawl4ai.git
synced 2025-10-12 10:52:06 +00:00
Enhance Docker support and improve installation process
- Added new Docker commands for platform-specific builds. - Updated README with comprehensive installation and setup instructions. - Introduced `post_install` method in setup script for automation. - Refined migration processes with enhanced error logging. - Bump version to 0.3.746 and updated dependencies.
This commit is contained in:
parent
93bf3e8a1f
commit
f9c98a377d
59
CHANGELOG.md
59
CHANGELOG.md
@ -1,5 +1,64 @@
|
|||||||
# Changelog
|
# Changelog
|
||||||
|
|
||||||
|
## [0.3.746] November 29, 2024
|
||||||
|
|
||||||
|
### Major Features
|
||||||
|
1. Enhanced Docker Support (Nov 29, 2024)
|
||||||
|
- Improved GPU support in Docker images.
|
||||||
|
- Dockerfile refactored for better platform-specific installations.
|
||||||
|
- Introduced new Docker commands for different platforms:
|
||||||
|
- `basic-amd64`, `all-amd64`, `gpu-amd64` for AMD64.
|
||||||
|
- `basic-arm64`, `all-arm64`, `gpu-arm64` for ARM64.
|
||||||
|
|
||||||
|
### Infrastructure & Documentation
|
||||||
|
- Enhanced README.md to improve user guidance and installation instructions.
|
||||||
|
- Added installation instructions for Playwright setup in README.
|
||||||
|
- Created and updated examples in `docs/examples/quickstart_async.py` to be more useful and user-friendly.
|
||||||
|
- Updated `requirements.txt` with a new `pydantic` dependency.
|
||||||
|
- Bumped version number in `crawl4ai/__version__.py` to 0.3.746.
|
||||||
|
|
||||||
|
### Breaking Changes
|
||||||
|
- Streamlined application structure:
|
||||||
|
- Removed static pages and related code from `main.py` which might affect existing deployments relying on static content.
|
||||||
|
|
||||||
|
### Development Updates
|
||||||
|
- Developed `post_install` method in `crawl4ai/install.py` to streamline post-installation setup tasks.
|
||||||
|
- Refined migration processes in `crawl4ai/migrations.py` with enhanced logging for better error visibility.
|
||||||
|
- Updated `docker-compose.yml` to support local and hub services for different architectures, enhancing build and deploy capabilities.
|
||||||
|
- Refactored example test cases in `docs/examples/docker_example.py` to facilitate comprehensive testing.
|
||||||
|
|
||||||
|
### README.md
|
||||||
|
Updated README with new docker commands and setup instructions.
|
||||||
|
Enhanced installation instructions and guidance.
|
||||||
|
|
||||||
|
### crawl4ai/install.py
|
||||||
|
Added post-install script functionality.
|
||||||
|
Introduced `post_install` method for automation of post-installation tasks.
|
||||||
|
|
||||||
|
### crawl4ai/migrations.py
|
||||||
|
Improved migration logging.
|
||||||
|
Refined migration processes and added better logging.
|
||||||
|
|
||||||
|
### docker-compose.yml
|
||||||
|
Refactored docker-compose for better service management.
|
||||||
|
Updated to define services for different platforms and versions.
|
||||||
|
|
||||||
|
### requirements.txt
|
||||||
|
Updated dependencies.
|
||||||
|
Added `pydantic` to requirements file.
|
||||||
|
|
||||||
|
### crawler/__version__.py
|
||||||
|
Updated version number.
|
||||||
|
Bumped version number to 0.3.746.
|
||||||
|
|
||||||
|
### docs/examples/quickstart_async.py
|
||||||
|
Enhanced example scripts.
|
||||||
|
Uncommented example usage in async guide for user functionality.
|
||||||
|
|
||||||
|
### main.py
|
||||||
|
Refactored code to improve maintainability.
|
||||||
|
Streamlined app structure by removing static pages code.
|
||||||
|
|
||||||
## [0.3.743] November 27, 2024
|
## [0.3.743] November 27, 2024
|
||||||
|
|
||||||
Enhance features and documentation
|
Enhance features and documentation
|
||||||
|
177
README.md
177
README.md
@ -220,48 +220,173 @@ Crawl4AI is available as Docker images for easy deployment. You can either pull
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Option 1: Docker Hub (Recommended)
|
<details>
|
||||||
|
<summary>🐳 <strong>Option 1: Docker Hub (Recommended)</strong></summary>
|
||||||
|
|
||||||
|
Choose the appropriate image based on your platform and needs:
|
||||||
|
|
||||||
|
### For AMD64 (Regular Linux/Windows):
|
||||||
```bash
|
```bash
|
||||||
# Pull and run from Docker Hub (choose one):
|
# Basic version (recommended)
|
||||||
docker pull unclecode/crawl4ai:basic # Basic crawling features
|
docker pull unclecode/crawl4ai:basic-amd64
|
||||||
docker pull unclecode/crawl4ai:all # Full installation (ML, LLM support)
|
docker run -p 11235:11235 unclecode/crawl4ai:basic-amd64
|
||||||
docker pull unclecode/crawl4ai:gpu # GPU-enabled version
|
|
||||||
|
|
||||||
# Run the container
|
# Full ML/LLM support
|
||||||
docker run -p 11235:11235 unclecode/crawl4ai:basic # Replace 'basic' with your chosen version
|
docker pull unclecode/crawl4ai:all-amd64
|
||||||
|
docker run -p 11235:11235 unclecode/crawl4ai:all-amd64
|
||||||
|
|
||||||
# In case you want to set platform to arm64
|
# With GPU support
|
||||||
docker run --platform linux/arm64 -p 11235:11235 unclecode/crawl4ai:basic
|
docker pull unclecode/crawl4ai:gpu-amd64
|
||||||
|
docker run -p 11235:11235 unclecode/crawl4ai:gpu-amd64
|
||||||
# In case to allocate more shared memory for the container
|
|
||||||
docker run --shm-size=2gb -p 11235:11235 unclecode/crawl4ai:basic
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
### For ARM64 (M1/M2 Macs, ARM servers):
|
||||||
|
```bash
|
||||||
|
# Basic version (recommended)
|
||||||
|
docker pull unclecode/crawl4ai:basic-arm64
|
||||||
|
docker run -p 11235:11235 unclecode/crawl4ai:basic-arm64
|
||||||
|
|
||||||
### Option 2: Build from Repository
|
# Full ML/LLM support
|
||||||
|
docker pull unclecode/crawl4ai:all-arm64
|
||||||
|
docker run -p 11235:11235 unclecode/crawl4ai:all-arm64
|
||||||
|
|
||||||
|
# With GPU support
|
||||||
|
docker pull unclecode/crawl4ai:gpu-arm64
|
||||||
|
docker run -p 11235:11235 unclecode/crawl4ai:gpu-arm64
|
||||||
|
```
|
||||||
|
|
||||||
|
Need more memory? Add `--shm-size`:
|
||||||
|
```bash
|
||||||
|
docker run --shm-size=2gb -p 11235:11235 unclecode/crawl4ai:basic-amd64
|
||||||
|
```
|
||||||
|
|
||||||
|
Test the installation:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:11235/health
|
||||||
|
```
|
||||||
|
|
||||||
|
### For Raspberry Pi (32-bit) (Experimental)
|
||||||
|
```bash
|
||||||
|
# Pull and run basic version (recommended for Raspberry Pi)
|
||||||
|
docker pull unclecode/crawl4ai:basic-armv7
|
||||||
|
docker run -p 11235:11235 unclecode/crawl4ai:basic-armv7
|
||||||
|
|
||||||
|
# With increased shared memory if needed
|
||||||
|
docker run --shm-size=2gb -p 11235:11235 unclecode/crawl4ai:basic-armv7
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: Due to hardware constraints, only the basic version is recommended for Raspberry Pi.
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>🐳 <strong>Option 2: Build from Repository</strong></summary>
|
||||||
|
|
||||||
|
Build the image locally based on your platform:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the repository
|
# Clone the repository
|
||||||
git clone https://github.com/unclecode/crawl4ai.git
|
git clone https://github.com/unclecode/crawl4ai.git
|
||||||
cd crawl4ai
|
cd crawl4ai
|
||||||
|
|
||||||
# Build the image
|
# For AMD64 (Regular Linux/Windows)
|
||||||
docker build -t crawl4ai:local \
|
docker build --platform linux/amd64 \
|
||||||
--build-arg INSTALL_TYPE=basic \ # Options: basic, all
|
--tag crawl4ai:local \
|
||||||
|
--build-arg INSTALL_TYPE=basic \
|
||||||
.
|
.
|
||||||
|
|
||||||
# In case you want to set platform to arm64
|
# For ARM64 (M1/M2 Macs, ARM servers)
|
||||||
docker build -t crawl4ai:local \
|
docker build --platform linux/arm64 \
|
||||||
--build-arg INSTALL_TYPE=basic \ # Options: basic, all
|
--tag crawl4ai:local \
|
||||||
--platform linux/arm64 \
|
--build-arg INSTALL_TYPE=basic \
|
||||||
.
|
.
|
||||||
|
|
||||||
# Run your local build
|
|
||||||
docker run -p 11235:11235 crawl4ai:local
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Build options:
|
||||||
|
- INSTALL_TYPE=basic (default): Basic crawling features
|
||||||
|
- INSTALL_TYPE=all: Full ML/LLM support
|
||||||
|
- ENABLE_GPU=true: Add GPU support
|
||||||
|
|
||||||
|
Example with all options:
|
||||||
|
```bash
|
||||||
|
docker build --platform linux/amd64 \
|
||||||
|
--tag crawl4ai:local \
|
||||||
|
--build-arg INSTALL_TYPE=all \
|
||||||
|
--build-arg ENABLE_GPU=true \
|
||||||
|
.
|
||||||
|
```
|
||||||
|
|
||||||
|
Run your local build:
|
||||||
|
```bash
|
||||||
|
# Regular run
|
||||||
|
docker run -p 11235:11235 crawl4ai:local
|
||||||
|
|
||||||
|
# With increased shared memory
|
||||||
|
docker run --shm-size=2gb -p 11235:11235 crawl4ai:local
|
||||||
|
```
|
||||||
|
|
||||||
|
Test the installation:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:11235/health
|
||||||
|
```
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>🐳 <strong>Option 3: Using Docker Compose</strong></summary>
|
||||||
|
|
||||||
|
Docker Compose provides a more structured way to run Crawl4AI, especially when dealing with environment variables and multiple configurations.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone the repository
|
||||||
|
git clone https://github.com/unclecode/crawl4ai.git
|
||||||
|
cd crawl4ai
|
||||||
|
```
|
||||||
|
|
||||||
|
### For AMD64 (Regular Linux/Windows):
|
||||||
|
```bash
|
||||||
|
# Build and run locally
|
||||||
|
docker-compose --profile local-amd64 up
|
||||||
|
|
||||||
|
# Run from Docker Hub
|
||||||
|
VERSION=basic docker-compose --profile hub-amd64 up # Basic version
|
||||||
|
VERSION=all docker-compose --profile hub-amd64 up # Full ML/LLM support
|
||||||
|
VERSION=gpu docker-compose --profile hub-amd64 up # GPU support
|
||||||
|
```
|
||||||
|
|
||||||
|
### For ARM64 (M1/M2 Macs, ARM servers):
|
||||||
|
```bash
|
||||||
|
# Build and run locally
|
||||||
|
docker-compose --profile local-arm64 up
|
||||||
|
|
||||||
|
# Run from Docker Hub
|
||||||
|
VERSION=basic docker-compose --profile hub-arm64 up # Basic version
|
||||||
|
VERSION=all docker-compose --profile hub-arm64 up # Full ML/LLM support
|
||||||
|
VERSION=gpu docker-compose --profile hub-arm64 up # GPU support
|
||||||
|
```
|
||||||
|
|
||||||
|
Environment variables (optional):
|
||||||
|
```bash
|
||||||
|
# Create a .env file
|
||||||
|
CRAWL4AI_API_TOKEN=your_token
|
||||||
|
OPENAI_API_KEY=your_openai_key
|
||||||
|
CLAUDE_API_KEY=your_claude_key
|
||||||
|
```
|
||||||
|
|
||||||
|
The compose file includes:
|
||||||
|
- Memory management (4GB limit, 1GB reserved)
|
||||||
|
- Shared memory volume for browser support
|
||||||
|
- Health checks
|
||||||
|
- Auto-restart policy
|
||||||
|
- All necessary port mappings
|
||||||
|
|
||||||
|
Test the installation:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:11235/health
|
||||||
|
```
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Quick Test
|
### Quick Test
|
||||||
@ -278,11 +403,11 @@ response = requests.post(
|
|||||||
)
|
)
|
||||||
task_id = response.json()["task_id"]
|
task_id = response.json()["task_id"]
|
||||||
|
|
||||||
# Get results
|
# Continue polling until the task is complete (status="completed")
|
||||||
result = requests.get(f"http://localhost:11235/task/{task_id}")
|
result = requests.get(f"http://localhost:11235/task/{task_id}")
|
||||||
```
|
```
|
||||||
|
|
||||||
For advanced configuration, environment variables, and usage examples, see our [Docker Deployment Guide](https://crawl4ai.com/mkdocs/basic/docker-deployment/).
|
For more examples, see our [Docker Examples](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_example.py). For advanced configuration, environment variables, and usage examples, see our [Docker Deployment Guide](https://crawl4ai.com/mkdocs/basic/docker-deployment/).
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
@ -1,5 +1,6 @@
|
|||||||
services:
|
services:
|
||||||
crawl4ai:
|
# Local build services for different platforms
|
||||||
|
crawl4ai-amd64:
|
||||||
build:
|
build:
|
||||||
context: .
|
context: .
|
||||||
dockerfile: Dockerfile
|
dockerfile: Dockerfile
|
||||||
@ -7,35 +8,39 @@ services:
|
|||||||
PYTHON_VERSION: "3.10"
|
PYTHON_VERSION: "3.10"
|
||||||
INSTALL_TYPE: ${INSTALL_TYPE:-basic}
|
INSTALL_TYPE: ${INSTALL_TYPE:-basic}
|
||||||
ENABLE_GPU: false
|
ENABLE_GPU: false
|
||||||
profiles: ["local"]
|
platforms:
|
||||||
ports:
|
- linux/amd64
|
||||||
- "11235:11235"
|
profiles: ["local-amd64"]
|
||||||
- "8000:8000"
|
extends: &base-config
|
||||||
- "9222:9222"
|
file: docker-compose.yml
|
||||||
- "8080:8080"
|
service: base-config
|
||||||
environment:
|
|
||||||
- CRAWL4AI_API_TOKEN=${CRAWL4AI_API_TOKEN:-}
|
|
||||||
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
|
|
||||||
- CLAUDE_API_KEY=${CLAUDE_API_KEY:-}
|
|
||||||
volumes:
|
|
||||||
- /dev/shm:/dev/shm
|
|
||||||
deploy:
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
memory: 4G
|
|
||||||
reservations:
|
|
||||||
memory: 1G
|
|
||||||
restart: unless-stopped
|
|
||||||
healthcheck:
|
|
||||||
test: ["CMD", "curl", "-f", "http://localhost:11235/health"]
|
|
||||||
interval: 30s
|
|
||||||
timeout: 10s
|
|
||||||
retries: 3
|
|
||||||
start_period: 40s
|
|
||||||
|
|
||||||
crawl4ai-hub:
|
crawl4ai-arm64:
|
||||||
image: unclecode/crawl4ai:basic
|
build:
|
||||||
profiles: ["hub"]
|
context: .
|
||||||
|
dockerfile: Dockerfile
|
||||||
|
args:
|
||||||
|
PYTHON_VERSION: "3.10"
|
||||||
|
INSTALL_TYPE: ${INSTALL_TYPE:-basic}
|
||||||
|
ENABLE_GPU: false
|
||||||
|
platforms:
|
||||||
|
- linux/arm64
|
||||||
|
profiles: ["local-arm64"]
|
||||||
|
extends: *base-config
|
||||||
|
|
||||||
|
# Hub services for different platforms and versions
|
||||||
|
crawl4ai-hub-amd64:
|
||||||
|
image: unclecode/crawl4ai:${VERSION:-basic}-amd64
|
||||||
|
profiles: ["hub-amd64"]
|
||||||
|
extends: *base-config
|
||||||
|
|
||||||
|
crawl4ai-hub-arm64:
|
||||||
|
image: unclecode/crawl4ai:${VERSION:-basic}-arm64
|
||||||
|
profiles: ["hub-arm64"]
|
||||||
|
extends: *base-config
|
||||||
|
|
||||||
|
# Base configuration to be extended
|
||||||
|
base-config:
|
||||||
ports:
|
ports:
|
||||||
- "11235:11235"
|
- "11235:11235"
|
||||||
- "8000:8000"
|
- "8000:8000"
|
||||||
|
@ -78,20 +78,20 @@ def test_docker_deployment(version="basic"):
|
|||||||
time.sleep(5)
|
time.sleep(5)
|
||||||
|
|
||||||
# Test cases based on version
|
# Test cases based on version
|
||||||
# test_basic_crawl(tester)
|
|
||||||
# test_basic_crawl(tester)
|
|
||||||
# test_basic_crawl_sync(tester)
|
|
||||||
test_basic_crawl_direct(tester)
|
test_basic_crawl_direct(tester)
|
||||||
|
test_basic_crawl(tester)
|
||||||
|
test_basic_crawl(tester)
|
||||||
|
test_basic_crawl_sync(tester)
|
||||||
|
|
||||||
# if version in ["full", "transformer"]:
|
if version in ["full", "transformer"]:
|
||||||
# test_cosine_extraction(tester)
|
test_cosine_extraction(tester)
|
||||||
|
|
||||||
# test_js_execution(tester)
|
test_js_execution(tester)
|
||||||
# test_css_selector(tester)
|
test_css_selector(tester)
|
||||||
# test_structured_extraction(tester)
|
test_structured_extraction(tester)
|
||||||
# test_llm_extraction(tester)
|
test_llm_extraction(tester)
|
||||||
# test_llm_with_ollama(tester)
|
test_llm_with_ollama(tester)
|
||||||
# test_screenshot(tester)
|
test_screenshot(tester)
|
||||||
|
|
||||||
|
|
||||||
def test_basic_crawl(tester: Crawl4AiTester):
|
def test_basic_crawl(tester: Crawl4AiTester):
|
||||||
|
Loading…
x
Reference in New Issue
Block a user