326 Commits

Author SHA1 Message Date
Abimael Martell
e4eb79f16c
python-sdk: Update Agent Client (#2579)
* python-sdk: Update Agent Client

* bump version

* cr comment

* validate schema type
2025-12-18 21:38:57 -08:00
Gergő Móricz
e1e9c38a7d feat(api): a/sctu 2025-12-17 22:21:28 +01:00
Gergő Móricz
6a2425b776 feat(api): a/mc 2025-12-17 20:24:11 +01:00
Gergő Móricz
0a5cc475ff feat(api): ab/c 2025-12-16 21:12:27 +01:00
Gergő Móricz
2efd0ae225 feat: ab 2025-12-14 22:37:28 +01:00
Gergő Móricz
419b80a4c7 feat(sdk): e3 2025-12-13 16:43:06 +01:00
Rafael Miller
02d9b5fe0c
(python-sdk)fix/max_pages (#2527)
* (python-sdk)fix/max_pages

* bump version
2025-12-09 15:09:54 -03:00
Rafael Miller
aecce165b5
added timezone to metadata response (#2526)
* added timezone to metadata response

* removed hallucination tests
2025-12-09 13:50:02 -03:00
rafaelmmiller
7e2328ad1d chore: bump sdk versions 2025-12-07 14:57:14 -03:00
Rafael Miller
d496714d44
(sdks)feat/added concurrency info to metadata (#2502) 2025-12-07 14:48:29 -03:00
Rafael Miller
3d418df1bc
(sdk)fix/same timeout as api now (#2503) 2025-12-07 14:48:14 -03:00
Rafael Miller
cdd1e897c1
Python sdk fix/batch validate limit (#2399)
* (python-sdk) fix: removed 1000 url limit from batch scrape validator

* fixed tests

* Update apps/python-sdk/firecrawl/v2/methods/aio/batch.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/batch.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

---------

Co-authored-by: Micah Stairs <micah.stairs@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-12-04 18:03:41 -03:00
Gergő Móricz
eebe41623e
feat(extract): port to billing credits (#2482)
* feat(extract): port to billing credits

* added normalization

---------

Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
2025-12-03 19:25:53 +01:00
tom
9f4f011a78
Add minAge parameter to scrape (#2452)
* feat(api): add minAge parameter to scrape options

* test(api): update minAge test to use scrapeRaw for error handling
2025-11-30 20:35:49 +00:00
Gaurav Chadha
08aee42275
fix: Add support for ignoreQueryParameter in map SDKs (#2429)
* add-support-for-ignoreQueryParameter-map

Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com>

* Update package.json

---------

Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com>
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
2025-11-26 15:47:11 -03:00
Rafael Miller
4c860515df
(python-sdk) feat: added extra fields to metadata (#2441)
* (python-sdk) feat: added extra fields to metadata

* Update apps/python-sdk/example.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* fixes metadata coerce function for unknown-keys

* Update types.py

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-11-26 10:41:42 -03:00
rafaelmmiller
881f566bf9 fixed json input on python-sdk 2025-11-12 13:47:16 -03:00
rafaelmmiller
d4b7472397 bump python-sdk version 2025-11-12 11:52:49 -03:00
Neha Prasad
cc411b6a3b
fix: image search field mapping in Python SDK (#2244)
* added field mapping function

* applied field normalization to search results
2025-11-12 11:51:22 -03:00
Gaurav Chadha
8fd980f343
update: Adds support for recursive schema for python-sdk with model selection (#2266)
* add-support-for-recursive-schema-python-sdk
rebase
Signed-off-by: Gaurav Chadha <chadha93@192.168.1.13>

* remove-weak-map

Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com>

* Update apps/python-sdk/firecrawl/v1/client.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* handle-circular-ref

Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com>

* added unit tests. added some ignores to config for WIPs

---------

Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
2025-11-12 11:17:45 -03:00
devin-ai-integration[bot]
dabf991ca9
Add branding format support to JS and Python SDKs (#2360)
* Add branding format support to JS and Python SDKs

Co-Authored-By: abi@sideguide.dev <abimex@gmail.com>

* Add comprehensive unit tests for branding format in both SDKs

- Add JS SDK unit tests in branding.test.ts with 4 test cases
- Add Python SDK unit tests in test_branding.py with 5 test cases
- Update Python SDK normalize_document_input to handle colorScheme -> color_scheme conversion
- Add model_config extra='allow' to BrandingProfile for future extensibility
- All tests pass locally (25 JS tests, 5 Python tests)

Co-Authored-By: abi@sideguide.dev <abimex@gmail.com>

* Bump SDK versions for branding format release

- Bump JS SDK version from 4.4.1 to 4.5.0
- Bump Python SDK version from 4.5.0 to 4.6.0

Version bumps reflect the addition of branding format support and comprehensive unit tests.

Co-Authored-By: abi@sideguide.dev <abimex@gmail.com>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: abi@sideguide.dev <abimex@gmail.com>
2025-11-04 16:41:29 -08:00
Abimael Martell
ee562fc2c7
Merge pull request #2290 from firecrawl/fix-python-sdk-req-key
python-sdk: Don't require API Key when running Self Hosted
2025-10-17 08:17:38 -07:00
Abimael Martell
69f7e588be better documentation 2025-10-17 00:17:46 -07:00
Abimael Martell
61b3c71963 increase feat version 2025-10-16 15:30:11 -07:00
Abimael Martell
7240c41d9a cr comments 2025-10-16 08:41:22 -07:00
Abimael Martell
8c2189f557 add default None to api key 2025-10-16 08:28:49 -07:00
Abimael Martell
26d65c87bf python-sdk: Don't require API Key when running Self Hosted 2025-10-15 16:23:16 -07:00
Abimael Martell
66abf0f8d0 python-sdk: Fix timeout handling across api calls 2025-10-15 15:46:31 -07:00
Nicolas
8a3936fdc0 Nick: pdf search category 2025-10-13 11:09:10 -03:00
devin-ai-integration[bot]
0b8d87caf0
chore(python-sdk): bump version to 4.3.7 for poll_interval fix (#2265)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: gaurav@sideguide.dev <gauravchadha1676@gmail.com>
2025-10-10 12:44:09 -03:00
Jeel Rupareliya
57babbaf09
python-sdk: include "cancelled" in CrawlJob.status and exit wait loop on cancel (fixes #2190) (#2240)
* cancelled added in stats if job is cancelled

* chore: stop tracking local venv in apps/python-sdk/.venv

* revert: exclude local docker-compose port mapping change from PR

* chore: ignore local SDK venv and remove from tracking

* redis rate limit url added

* removed redis rate limit

* Update crawl.py
2025-10-01 12:32:52 -03:00
Gaurav Chadha
e661bd2b7c
fix: add missing poll_interval param in watcher (#2155)
* add-poll_interval-param-in-watcher

Signed-off-by: Chadha93 <gauravchadha1676@gmail.com>

* add-guard-for-non-negative-values

Signed-off-by: Chadha93 <gauravchadha1676@gmail.com>

---------

Signed-off-by: Chadha93 <gauravchadha1676@gmail.com>
2025-09-22 17:10:05 -03:00
Nicolas
18c4b13b22 Nick: fixed integrations bug in search method 2025-09-07 16:09:58 -03:00
Rafael Miller
6aae67bd8c
feat(sdk): added agent option (#2108)
* feat(sdk): added agent option

* Update apps/python-sdk/firecrawl/v2/methods/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-09-05 18:21:40 -03:00
devin-ai-integration[bot]
9e6fc1b54d
Update Type Annotations for v2 Async Search (SearchResponse → SearchData) (#2097)
* Update v2 async search type annotations from SearchResponse to SearchData

- Remove SearchResponse export from firecrawl.types for v2 usage
- Aligns type annotations with actual runtime behavior
- v2 async search methods already return SearchData directly
- v1 methods continue to use SearchResponse as expected
- Resolves Linear ticket ENG-3321

Co-Authored-By: rafael@sideguide.dev <rafael@sideguide.dev>

* bump version

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: rafael@sideguide.dev <rafael@sideguide.dev>
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
2025-09-05 09:37:16 -03:00
Rafael Miller
a2517a8855
Feat(sdks): integration param (#2096)
* feat(sdks): integration param

* added underline to integration param in sdk tests

* removed param that didn't make sense

* chore(sdks): bump versions

* Update apps/js-sdk/firecrawl/src/v2/methods/search.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/crawl.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/batch.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* cubic's review

* Update apps/python-sdk/firecrawl/v2/methods/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/js-sdk/firecrawl/src/v2/methods/crawl.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/utils/validation.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/search.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/search.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/map.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/search.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* cubics fixes

* Update apps/js-sdk/firecrawl/src/v2/methods/batch.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* here we go cubic

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-09-03 16:39:43 -03:00
Rafael Miller
471feacb22
feat(python-sdk): normalize docs in search results (#2098) 2025-09-03 16:00:50 -03:00
tom
9a3bd6ca50
Add proxy location support to crawl and map endpoints (ENG-3361) (#2092) 2025-09-03 16:19:00 +02:00
Rafael Miller
3968a602fe
fix(python-sdk): added missing get_queue_status in aio and added to t… (#2081)
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-09-01 17:44:23 +02:00
Gergő Móricz
90778e4604
feat: historical credit/token usage endpoints + more data in existing usage endpoints (#2077) 2025-09-01 13:23:38 +02:00
Gergő Móricz
76cc2decd0
feat(api): add /team/queue-status endpoint (#2063)
* feat(api): add /team/queue-status endpoint

* chore: bump SDKs

* fix bad imports

* various fixes (ty cubic)

* rebase fix
2025-09-01 11:03:05 +02:00
Nicolas
b05327dbbe Nick: fix py sdk validation error 2025-08-30 18:41:47 -04:00
devin-ai-integration[bot]
7bea613ec0
feat: add maxPages parameter to PDF parser in v2 scrape API (#2047)
* feat: add maxPages parameter to PDF parser

- Extend parsersSchema to support both string array ['pdf'] and object array [{'type':'pdf','maxPages':10}] formats
- Add shouldParsePDF and getPDFMaxPages helper functions for consistent parser handling
- Update PDF processing to respect maxPages limit in both RunPod MU and PdfParse processors
- Modify billing calculation to use actual pages processed instead of total pages
- Add comprehensive tests for object format parsers, page limiting, and validation
- Maintain backward compatibility with existing string array format

The maxPages parameter is optional and defaults to unlimited when not specified.
Page limiting occurs before processing to avoid unnecessary computation and billing
is based on the effective page count for fairness.

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: correct parsersSchema to handle individual parser items

- Change union from array-level to item-level in parsersSchema
- Now accepts array where each item is either string 'pdf' or object {'type':'pdf','maxPages':10}
- When parser is string 'pdf', maxPages is undefined (no limit)
- When parser is object, use specified maxPages value
- Maintains backward compatibility with existing ['pdf'] format

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: remove maxPages logic from scrapePDFWithParsePDF per PR feedback

- Remove maxPages parameter and truncation logic from scrapePDFWithParsePDF
- Keep maxPages logic only in scrapePDFWithRunPodMU where it provides cost savings
- Addresses feedback from mogery: pdf-parse doesn't cost anything extra to process all pages

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* test: add maxPages parameter tests for crawl and search endpoints

- Add crawl endpoint test with PDF maxPages parameter
- Add search endpoint test with PDF maxPages parameter
- Verify maxPages works end-to-end across all endpoints (scrape, crawl, search)
- Ensure schema inheritance and data flow work correctly

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: remove problematic crawl and search tests for maxPages

- Remove crawl test that incorrectly uses direct PDF URL
- Remove search test that relies on unreliable external search results
- maxPages functionality verified through schema inheritance and data flow analysis
- Comprehensive tests already exist in parsers.test.ts for core functionality

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* feat: add maxPages parameter support to Python and JavaScript SDKs

- Add PDFParser class to Python SDK with max_pages field validation (1-1000)
- Update Python SDK parsers field to support Union[List[str], List[Union[str, PDFParser]]]
- Add parsers preprocessing in Python SDK to convert snake_case to camelCase
- Update JavaScript SDK parsers type to Array<string | { type: 'pdf'; maxPages?: number }>
- Add maxPages validation to JavaScript SDK ensureValidScrapeOptions
- Maintain backward compatibility with existing ['pdf'] string array format
- Support mixed formats in both SDKs
- Add comprehensive test files for both SDKs

Addresses GitHub comment requesting SDK support for maxPages parameter.

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* cleanup: remove temporary test files

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: correct parsers schema to support mixed string and object arrays

- Fix parsers schema to properly handle mixed arrays like ['pdf', {type: 'pdf', maxPages: 5}]
- Resolves backward compatibility issue that was causing webhook test failures
- All parser formats now work: ['pdf'], [{type: 'pdf'}], [{type: 'pdf', maxPages: 10}], mixed arrays

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* Delete SDK_MAXPAGES_IMPLEMENTATION.md

* feat: increase maxPages limit from 1000 to 10000 pages

- Update backend Zod schema validation in types.ts
- Update JavaScript SDK client-side validation
- Update API test cases to use new 10000 limit
- Addresses GitHub comment feedback from nickscamara

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: update Python SDK maxPages limit from 1000 to 10000

- Fix validation discrepancy between Python SDK (1000) and backend/JS SDK (10000)
- Ensures consistent maxPages validation across all SDKs
- Addresses critical bug identified in PR review

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: remove SDK-side maxPages validation per PR feedback

- Remove maxPages range validation from JavaScript SDK validation.ts
- Remove maxPages range validation from Python SDK types.py
- Keep backend API validation as single source of truth
- Addresses GitHub comment from mogery

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* Nick:

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: thomas@sideguide.dev <thomas@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-08-29 20:13:58 -04:00
Rafael Miller
815963890f
feat(sdks): next cursor pagination (#2067)
* feat(sdks): next cursor pagination

- Default auto-pagination enabled; pass { autoPaginate: false } (JS) or PaginationConfig(auto_paginate=False) (Python) to restore single-page behavior.
- Potentially larger responses and fewer calls by default.

* good to go

* bump sdks version

* fixed tests and endpoints

* addressed cubic's appointments

* docs

* cubic's appointments

* Update apps/js-sdk/firecrawl/src/v2/methods/crawl.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* here we go

* Update apps/python-sdk/firecrawl/v2/utils/http_client.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/crawl.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/__tests__/unit/v2/methods/test_pagination.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* rafa:

* Update apps/python-sdk/firecrawl/v2/methods/crawl.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update example_pagination.py

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-08-29 20:01:51 -03:00
rafaelmmiller
293b532629 chore(sdks): bumped sdks 2025-08-27 11:23:13 -03:00
Rafael Miller
7ac607617f
fix(python-sdk): missing methods in client (#2050)
added get_active_crawls and start_extract methods that were missing in the sync client
2025-08-27 08:09:12 -03:00
Vishnu Krishnan
0589994ec6
feat(api): support extraction of data-* attributes in scrape endpoints (#2006)
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-08-27 09:53:11 +02:00
Vishnu Krishnan
30c6bdd938
feat(api): add image extraction support to v2 scrape endpoint (#2008) 2025-08-27 09:50:06 +02:00
Nicolas
54d8d92c99 Nick: sdks now have search categories 2025-08-23 16:59:39 -07:00
Nicolas
44715278b3 Nick: updated sdks with search categories 2025-08-23 16:47:43 -07:00