mirror of
				https://github.com/mendableai/firecrawl.git
				synced 2025-11-04 03:53:17 +00:00 
			
		
		
		
	
		
			
				
	
	
	
		
			1.8 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			1.8 KiB
		
	
	
	
	
	
	
	
Gemini 2.5 Web Crawler
A powerful web crawler that uses Google's Gemini 2.5 Pro model to intelligently analyze web content, PDFs, and images based on user-defined objectives.
Features
- Intelligent URL mapping and ranking based on relevance to search objective
 - PDF content extraction and analysis
 - Image content analysis and description
 - Smart content filtering based on user objectives
 - Support for multiple content types (markdown, PDFs, images)
 - Color-coded console output for better readability
 
Prerequisites
- Python 3.8+
 - Google Cloud API key with Gemini API access
 - Firecrawl API key
 
Installation
- Clone the repository:
 
git clone <your-repo-url>
cd <your-repo-directory>
- Install the required dependencies:
 
pip install -r requirements.txt
- Create a 
.envfile based on.env.example: 
cp .env.example .env
- Add your API keys to the 
.envfile: 
FIRECRAWL_API_KEY=your_firecrawl_api_key
GEMINI_API_KEY=your_gemini_api_key
Usage
Run the script:
python gemini-2.5-crawler.py
The script will prompt you for:
- The website URL to crawl
 - Your search objective
 
The crawler will then:
- Map the website and find relevant pages
 - Analyze the content using Gemini 2.5 Pro
 - Extract and analyze any PDFs or images found
 - Return structured information related to your objective
 
Output
The script provides color-coded console output for:
- Process steps and progress
 - Debug information
 - Success and error messages
 - Final results in JSON format
 
Error Handling
The script includes comprehensive error handling for:
- API failures
 - Content extraction issues
 - Invalid URLs
 - Timeouts
 - JSON parsing errors
 
Note
This script uses the experimental Gemini 2.5 Pro model (gemini-2.5-pro-exp-03-25). Make sure you have appropriate access and quota for using this model.