- Add parsePDF field to ScrapeOptions class for Search API usage
- Add parse_pdf parameter to both sync and async scrape_url methods
- Add parameter handling logic to pass parsePDF to API requests
- Add comprehensive tests for parsePDF functionality
- Maintain backward compatibility with existing API
The parsePDF parameter controls PDF processing behavior:
- When true (default): PDF content extracted and converted to markdown
- When false: PDF returned in base64 encoding with flat credit rate
Resolves missing parsePDF support in Python SDK v2.9.0
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* feat: add followInternalLinks parameter as semantic replacement for allowBackwardLinks
- Add followInternalLinks parameter to crawl API with same functionality as allowBackwardLinks
- Update transformation logic to use followInternalLinks with precedence over allowBackwardLinks
- Add parameter to Python SDK crawl methods with proper precedence handling
- Add parameter to Node.js SDK CrawlParams interface
- Add comprehensive tests for new parameter and backward compatibility
- Maintain full backward compatibility for existing allowBackwardLinks usage
- Add deprecation notices in documentation while preserving functionality
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* fix: revert accidental cache=True changes to preserve original cache parameter handling
- Revert cache=True back to cache=cache in generate_llms_text methods
- Preserve original parameter passing behavior for cache parameter
- Fix accidental hardcoding of cache parameter to True
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* refactor: rename followInternalLinks to crawlEntireDomain across API, SDKs, and tests
- Rename followInternalLinks parameter to crawlEntireDomain in API schema
- Update Node.js SDK CrawlParams interface to use crawlEntireDomain
- Update Python SDK methods to use crawl_entire_domain parameter
- Update test cases to use new crawlEntireDomain parameter name
- Maintain backward compatibility with allowBackwardLinks
- Update transformation logic to use crawlEntireDomain with precedence
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* fix: add missing cache parameter to generate_llms_text and update documentation references
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* Update apps/python-sdk/firecrawl/firecrawl.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nick <nicolascamara29@gmail.com>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
* feat(python-sdk/CrawlWatcher): remove max payload size from WebSocket
* Update __init__.py
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Fix LLMs.txt cache bug with subdomains and add bypass option (#1519)
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Nick:
* Update LLMs.txt test file to use helper functions and concurrent tests
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Remove LLMs.txt test file as requested
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Change parameter name to 'cache' and keep 7-day expiration
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Update generate-llmstxt-supabase.ts
* Update JS and Python SDKs to include cache parameter
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Fix LLMs.txt cache implementation to use normalizeUrl and exact matching
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Revert "Fix LLMs.txt cache implementation to use normalizeUrl and exact matching"
This reverts commit d05b9964677b7b2384453329d2ac99d841467053.
* Nick:
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* This fixes issue #1512 by making the milliseconds field optional in WaitAction and adding a validator to ensure exactly one of milliseconds or selector is provided.
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Update firecrawl.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Fix: Handle both dict and model instances in actions parameter
Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>
* Update __init__.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nicolascamara29@gmail.com>
* sdk-fix/schema-check
* version bump
* schema validation for extract and jsonOptions parameters
* Update firecrawl.py
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>