* add ZDR flag and v0 lockout
* zdr walls on v1 and propagation
* zdr within scrapeurl
* fixes
* more fixes
* zdr flag on queue-worker logging
* final stretch + testing, needs f-e changes
* fixes
* self-serve ZDR through request body
* request-level zdr
* improved zdrcleaner
* generalize schema to allow for different data retention times in the future
* update Go version on CI
* feat(api/tests/zdr): test that nothing is logged
* fix(api/tests/zdr): correct log name
* fix(ci): envs
* fix(zdrcleaner): lower bound on db query
* zdr test with idmux
* WIP Assignments
* fix bad merge
remove unused identity
* fix stupid jest globals thing
* feat(scrapeURL/zdr): blacklist pdf action
* fix(concurrency-limit): zdr logging enforcement
* temp: remove extra billing for zdr
* SDK support
* final zdr business logic
fix rename
* fix test log filtering
* fix log filtering... again
* fix(tests/zdr): more logging exceptions
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add parsePDF parameter to JS SDK (clean implementation)
- Add parsePDF boolean parameter to CrawlScrapeOptions interface
- Parameter automatically flows through scrape and crawl operations via spread operator
- Add comprehensive test cases for parsePDF functionality in both scrape and crawl scenarios
- Tests verify parsePDF=true and parsePDF=false behavior with PDF files
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* Fix parsePDF tests to match actual API behavior
- Update parsePDF=false test to expect base64 data instead of markdown
- Tests now properly verify the difference between parsePDF=true and parsePDF=false
- Address GitHub comment about 'hallucinated' tests by fixing unrealistic expectations
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* Update index.test.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
- Move subdomain check logic before external link denial to make it reachable
- Add comprehensive tests for allowSubdomains functionality
- Ensure subdomain URLs are properly allowed/filtered based on configuration
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
* feat(python-sdk): add parsePDF parameter support
- Add parsePDF field to ScrapeOptions class for Search API usage
- Add parse_pdf parameter to both sync and async scrape_url methods
- Add parameter handling logic to pass parsePDF to API requests
- Add comprehensive tests for parsePDF functionality
- Maintain backward compatibility with existing API
The parsePDF parameter controls PDF processing behavior:
- When true (default): PDF content extracted and converted to markdown
- When false: PDF returned in base64 encoding with flat credit rate
Resolves missing parsePDF support in Python SDK v2.9.0
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* Update __init__.py
* Update test.py
* Update __init__.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add temporary exception for Faire team ID to bypass job expiration
- Add TEMP_FAIRE_TEAM_ID constant for team f96ad1a4-8102-4b35-9904-36fd517d3616
- Modify job expiration logic to skip 24-hour timeout for this team
- Add tests to verify Faire team bypasses expiration and others don't
- Temporary solution to allow Faire team access to expired crawl jobs
Co-Authored-By: Micah Stairs <micah@sideguide.dev>
* Update apps/api/src/__tests__/snips/crawl.test.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
Update error handler to use RequestWithMaybeACUC type, allowing
access to optional ACUC properties on the request object. Include
team_id from ACUC in error logging to improve context for debugging.
* add scrapeTimeout parameter
* fix(api/ci): allow webhook server some time to settle
* fix(api/ci): extract time extension
* fix(api/ci): switch index location tests to a more reliable proxy
* check crawl errors + extend index cooldown
* fix lib