* add ZDR flag and v0 lockout
* zdr walls on v1 and propagation
* zdr within scrapeurl
* fixes
* more fixes
* zdr flag on queue-worker logging
* final stretch + testing, needs f-e changes
* fixes
* self-serve ZDR through request body
* request-level zdr
* improved zdrcleaner
* generalize schema to allow for different data retention times in the future
* update Go version on CI
* feat(api/tests/zdr): test that nothing is logged
* fix(api/tests/zdr): correct log name
* fix(ci): envs
* fix(zdrcleaner): lower bound on db query
* zdr test with idmux
* WIP Assignments
* fix bad merge
remove unused identity
* fix stupid jest globals thing
* feat(scrapeURL/zdr): blacklist pdf action
* fix(concurrency-limit): zdr logging enforcement
* temp: remove extra billing for zdr
* SDK support
* final zdr business logic
fix rename
* fix test log filtering
* fix log filtering... again
* fix(tests/zdr): more logging exceptions
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add parsePDF parameter to JS SDK (clean implementation)
- Add parsePDF boolean parameter to CrawlScrapeOptions interface
- Parameter automatically flows through scrape and crawl operations via spread operator
- Add comprehensive test cases for parsePDF functionality in both scrape and crawl scenarios
- Tests verify parsePDF=true and parsePDF=false behavior with PDF files
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* Fix parsePDF tests to match actual API behavior
- Update parsePDF=false test to expect base64 data instead of markdown
- Tests now properly verify the difference between parsePDF=true and parsePDF=false
- Address GitHub comment about 'hallucinated' tests by fixing unrealistic expectations
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* Update index.test.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* feat: add followInternalLinks parameter as semantic replacement for allowBackwardLinks
- Add followInternalLinks parameter to crawl API with same functionality as allowBackwardLinks
- Update transformation logic to use followInternalLinks with precedence over allowBackwardLinks
- Add parameter to Python SDK crawl methods with proper precedence handling
- Add parameter to Node.js SDK CrawlParams interface
- Add comprehensive tests for new parameter and backward compatibility
- Maintain full backward compatibility for existing allowBackwardLinks usage
- Add deprecation notices in documentation while preserving functionality
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* fix: revert accidental cache=True changes to preserve original cache parameter handling
- Revert cache=True back to cache=cache in generate_llms_text methods
- Preserve original parameter passing behavior for cache parameter
- Fix accidental hardcoding of cache parameter to True
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* refactor: rename followInternalLinks to crawlEntireDomain across API, SDKs, and tests
- Rename followInternalLinks parameter to crawlEntireDomain in API schema
- Update Node.js SDK CrawlParams interface to use crawlEntireDomain
- Update Python SDK methods to use crawl_entire_domain parameter
- Update test cases to use new crawlEntireDomain parameter name
- Maintain backward compatibility with allowBackwardLinks
- Update transformation logic to use crawlEntireDomain with precedence
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* fix: add missing cache parameter to generate_llms_text and update documentation references
Co-Authored-By: Nick <nicolascamara29@gmail.com>
* Update apps/python-sdk/firecrawl/firecrawl.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nick <nicolascamara29@gmail.com>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
* poc progress
* poc
* url splits and better url normalization
* feat(index): integrate into map
* fix on selfhost
* feat: modifiers
* separate index supa logic
* debug
* fix language comparison
* feat: dontStoreInCache
* feat(index): some rudimentary testing
* feat: use url split columns
* feat(queue-worker/kickoff): use index links to kickoff crawl
* feat(scrapeURL/index): behaviour on non-200 index entries
* feat/added benchmark for scrapes
* feat(map): ignoreIndex
* feat(index): batch insert
* fix(api/tests/scrape): fix index test to work with batching
* disable cacheable lookup for self hosting tests
* feat(js-sdk): dontStoreInCache
* chore(js-sdk): bump
* feat(index): FIRECRAWL_INDEX_WRITE_ONLY
* feat(api/test): index envs
* map benchmarks
* cleanup
* further fixes
* clean up on map
* remove extraneous log
* workflow test run
* asd
* improve fns
* try again
* wow i'm an idiot
* ok fixed
* wth
* revert
* async saving to index
* feat: enhance metadata extraction by including 'itemprop' attribute in HTML (#1624)
* feat(selfhost): deploy a playwright image (#1625)
* Testing improvements (FIR-2209) (#1623)
* yeet ad blocking tests until further notice
* feat: re-enable billing tests
* more timeout
* cache issues with billing test
* weird thing
* fix(api/tests/scrape/status): propagation time
* stupid
* no log
* sws
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com>
* Fix LLMs.txt cache bug with subdomains and add bypass option (#1519)
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Nick:
* Update LLMs.txt test file to use helper functions and concurrent tests
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Remove LLMs.txt test file as requested
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Change parameter name to 'cache' and keep 7-day expiration
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Update generate-llmstxt-supabase.ts
* Update JS and Python SDKs to include cache parameter
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Fix LLMs.txt cache implementation to use normalizeUrl and exact matching
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Revert "Fix LLMs.txt cache implementation to use normalizeUrl and exact matching"
This reverts commit d05b9964677b7b2384453329d2ac99d841467053.
* Nick:
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add change tracking support to Python and JS SDKs
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Replace test API keys with TEST_API_KEY placeholder
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Replace API keys with dummy values for testing
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Use environment variables for API keys in tests
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Move JS SDK test to correct location and add dependencies
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Remove old test file location
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Update test file to use TEST_API_KEY environment variable
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Update Python SDK test to use TEST_API_KEY environment variable
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Update package.json
* Update __init__.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nick@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add git-diff support to change tracking format
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Fix type issues with parse-diff library
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Fix parse-diff type definitions to match actual library structure
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Add structured output/prompt support to change tracking
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* (feat/change-tracking) Change Tracking Modes (#1447)
* Refactor change tracking to use modes array instead of separate formats
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Implement schema-based change tracking with old/new value comparison
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Nick:
* Nick: .json
* Update diff.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nick@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Update index.ts
* Update types.ts
* Update diff.ts
* Update scrape.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nick@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>