2024-06-21 17:56:54 +08:00
# Installation 💻
2024-09-24 20:52:08 +08:00
Crawl4AI offers flexible installation options to suit various use cases. You can install it as a Python package, use it with Docker, or run it as a local server.
2024-06-30 00:34:02 +08:00
2024-09-24 20:52:08 +08:00
## Option 1: Python Package Installation (Recommended)
2024-06-21 17:56:54 +08:00
2024-09-24 20:52:08 +08:00
Crawl4AI is now available on PyPI, making installation easier than ever. Choose the option that best fits your needs:
2024-06-21 17:56:54 +08:00
2024-09-24 20:52:08 +08:00
### Basic Installation
2024-08-02 15:55:32 +08:00
2024-09-24 20:52:08 +08:00
For basic web crawling and scraping tasks:
2024-06-21 17:56:54 +08:00
2024-06-30 00:22:17 +08:00
```bash
2024-09-24 20:52:08 +08:00
pip install crawl4ai
playwright install # Install Playwright dependencies
2024-06-30 00:22:17 +08:00
```
2024-06-21 17:56:54 +08:00
2024-09-24 20:52:08 +08:00
### Installation with PyTorch
2024-06-21 17:56:54 +08:00
2024-09-24 20:52:08 +08:00
For advanced text clustering (includes CosineSimilarity cluster strategy):
2024-06-30 00:15:29 +08:00
```bash
2024-09-24 20:52:08 +08:00
pip install crawl4ai[torch]
2024-06-21 17:56:54 +08:00
```
2024-09-24 20:52:08 +08:00
### Installation with Transformers
2024-08-01 17:56:19 +08:00
2024-09-24 20:52:08 +08:00
For text summarization and Hugging Face models:
2024-06-30 00:15:29 +08:00
```bash
2024-09-24 20:52:08 +08:00
pip install crawl4ai[transformer]
2024-06-21 17:56:54 +08:00
```
2024-09-24 20:52:08 +08:00
### Full Installation
2024-08-01 17:56:19 +08:00
2024-09-24 20:52:08 +08:00
For all features:
2024-08-01 17:56:19 +08:00
```bash
2024-09-24 20:52:08 +08:00
pip install crawl4ai[all]
2024-08-01 17:56:19 +08:00
```
2024-09-24 20:52:08 +08:00
### Development Installation
2024-08-01 17:56:19 +08:00
2024-09-24 20:52:08 +08:00
For contributors who plan to modify the source code:
2024-08-01 17:56:19 +08:00
```bash
2024-09-24 20:52:08 +08:00
git clone https://github.com/unclecode/crawl4ai.git
cd crawl4ai
pip install -e ".[all]"
playwright install # Install Playwright dependencies
2024-08-01 17:56:19 +08:00
```
2024-09-24 20:52:08 +08:00
💡 After installation with "torch", "transformer", or "all" options, it's recommended to run the following CLI command to load the required models:
2024-08-01 17:56:19 +08:00
```bash
2024-09-24 20:52:08 +08:00
crawl4ai-download-models
2024-08-01 17:56:19 +08:00
```
2024-09-24 20:52:08 +08:00
This is optional but will boost the performance and speed of the crawler. You only need to do this once after installation.
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
## Option 2: Using Docker (Coming Soon)
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
Docker support for Crawl4AI is currently in progress and will be available soon. This will allow you to run Crawl4AI in a containerized environment, ensuring consistency across different systems.
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
## Option 3: Local Server Installation
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
For those who prefer to run Crawl4AI as a local server, instructions will be provided once the Docker implementation is complete.
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
## Verifying Your Installation
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
After installation, you can verify that Crawl4AI is working correctly by running a simple Python script:
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
```python
import asyncio
from crawl4ai import AsyncWebCrawler
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
async def main():
async with AsyncWebCrawler(verbose=True) as crawler:
result = await crawler.arun(url="https://www.example.com")
print(result.markdown[:500]) # Print first 500 characters
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
if __name__ == "__main__":
asyncio.run(main())
2024-08-01 20:13:06 +08:00
```
2024-09-24 20:52:08 +08:00
This script should successfully crawl the example website and print the first 500 characters of the extracted content.
2024-08-01 20:13:06 +08:00
2024-09-24 20:52:08 +08:00
## Getting Help
2024-06-30 00:34:02 +08:00
2024-09-24 20:52:08 +08:00
If you encounter any issues during installation or usage, please check the [documentation ](https://crawl4ai.com/mkdocs/ ) or raise an issue on the [GitHub repository ](https://github.com/unclecode/crawl4ai/issues ).
2024-06-30 00:34:02 +08:00
2024-09-24 20:52:08 +08:00
Happy crawling! 🕷️🤖