* feat(python-sdk): add parsePDF parameter support
- Add parsePDF field to ScrapeOptions class for Search API usage
- Add parse_pdf parameter to both sync and async scrape_url methods
- Add parameter handling logic to pass parsePDF to API requests
- Add comprehensive tests for parsePDF functionality
- Maintain backward compatibility with existing API
The parsePDF parameter controls PDF processing behavior:
- When true (default): PDF content extracted and converted to markdown
- When false: PDF returned in base64 encoding with flat credit rate
Resolves missing parsePDF support in Python SDK v2.9.0
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* Update __init__.py
* Update test.py
* Update __init__.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add temporary exception for Faire team ID to bypass job expiration
- Add TEMP_FAIRE_TEAM_ID constant for team f96ad1a4-8102-4b35-9904-36fd517d3616
- Modify job expiration logic to skip 24-hour timeout for this team
- Add tests to verify Faire team bypasses expiration and others don't
- Temporary solution to allow Faire team access to expired crawl jobs
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* Update apps/api/src/__tests__/snips/crawl.test.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
Update error handler to use RequestWithMaybeACUC type, allowing
access to optional ACUC properties on the request object. Include
team_id from ACUC in error logging to improve context for debugging.
* add scrapeTimeout parameter
* fix(api/ci): allow webhook server some time to settle
* fix(api/ci): extract time extension
* fix(api/ci): switch index location tests to a more reliable proxy
* check crawl errors + extend index cooldown
* fix lib
* feat(api): remove old indexes pt. 1
* feat(map): better subdomain support
* more culling
* adjust map maxage
* feat(api/tests): add tests for pdf caching
* fix(scrapeURL/index): pdf caching
* restore pdf cache
* fix __experimental_cache
* sitemap fetching
* remove extra var
* feat: add followInternalLinks parameter as semantic replacement for allowBackwardLinks
- Add followInternalLinks parameter to crawl API with same functionality as allowBackwardLinks
- Update transformation logic to use followInternalLinks with precedence over allowBackwardLinks
- Add parameter to Python SDK crawl methods with proper precedence handling
- Add parameter to Node.js SDK CrawlParams interface
- Add comprehensive tests for new parameter and backward compatibility
- Maintain full backward compatibility for existing allowBackwardLinks usage
- Add deprecation notices in documentation while preserving functionality
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* fix: revert accidental cache=True changes to preserve original cache parameter handling
- Revert cache=True back to cache=cache in generate_llms_text methods
- Preserve original parameter passing behavior for cache parameter
- Fix accidental hardcoding of cache parameter to True
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* refactor: rename followInternalLinks to crawlEntireDomain across API, SDKs, and tests
- Rename followInternalLinks parameter to crawlEntireDomain in API schema
- Update Node.js SDK CrawlParams interface to use crawlEntireDomain
- Update Python SDK methods to use crawl_entire_domain parameter
- Update test cases to use new crawlEntireDomain parameter name
- Maintain backward compatibility with allowBackwardLinks
- Update transformation logic to use crawlEntireDomain with precedence
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* fix: add missing cache parameter to generate_llms_text and update documentation references
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* Update apps/python-sdk/firecrawl/firecrawl.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nick <nicolascamara29@gmail.com>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
- Add 'Deployment Type' field to Environment section
- Allows users to specify Cloud (firecrawl.dev) vs Self-hosted
- Helps maintainers better triage issues based on deployment context
- Positioned logically after OS field in existing template structure
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nick <nicolascamara29@gmail.com>
* Improve URL filtering error messages with specific denial reasons
- Add FilterResult and FilterLinksResult interfaces for structured error reporting
- Define DenialReason enum with specific, human-readable error messages
- Update filterURL method to return structured results with denial reasons
- Update filterLinks method to collect and return denial reasons for each URL
- Modify error handling in queue-worker.ts to use specific denial reasons
- Add comprehensive tests for different URL filtering scenarios
- Maintain backward compatibility while improving error specificity
Fixes: Misleading 'includePaths/excludePaths rules' error now shows actual denial reason (robots.txt, exclude patterns, depth limits, etc.)
Co-Authored-By: mogery@sideguide.dev <mogery@sideguide.dev>
* Fix test compilation error for FilterLinksResult interface
- Update crawler.test.ts to use filteredLinks.links.length instead of filteredLinks.length
- Update test expectations to use filteredLinks.links array
- Resolves TypeScript compilation error preventing CI from passing
Co-Authored-By: mogery@sideguide.dev <mogery@sideguide.dev>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: mogery@sideguide.dev <mogery@sideguide.dev>
* feat: add credits_billed everywhere
also a bit of logging improvement for logJob
* fix(queue-worker): db auth check before doing rpc for crawl/batch_scrape