299 Commits

Author SHA1 Message Date
Abimael Martell
26d65c87bf python-sdk: Don't require API Key when running Self Hosted 2025-10-15 16:23:16 -07:00
Nicolas
8a3936fdc0 Nick: pdf search category 2025-10-13 11:09:10 -03:00
devin-ai-integration[bot]
0b8d87caf0
chore(python-sdk): bump version to 4.3.7 for poll_interval fix (#2265)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: gaurav@sideguide.dev <gauravchadha1676@gmail.com>
2025-10-10 12:44:09 -03:00
Jeel Rupareliya
57babbaf09
python-sdk: include "cancelled" in CrawlJob.status and exit wait loop on cancel (fixes #2190) (#2240)
* cancelled added in stats if job is cancelled

* chore: stop tracking local venv in apps/python-sdk/.venv

* revert: exclude local docker-compose port mapping change from PR

* chore: ignore local SDK venv and remove from tracking

* redis rate limit url added

* removed redis rate limit

* Update crawl.py
2025-10-01 12:32:52 -03:00
Gaurav Chadha
e661bd2b7c
fix: add missing poll_interval param in watcher (#2155)
* add-poll_interval-param-in-watcher

Signed-off-by: Chadha93 <gauravchadha1676@gmail.com>

* add-guard-for-non-negative-values

Signed-off-by: Chadha93 <gauravchadha1676@gmail.com>

---------

Signed-off-by: Chadha93 <gauravchadha1676@gmail.com>
2025-09-22 17:10:05 -03:00
Nicolas
18c4b13b22 Nick: fixed integrations bug in search method 2025-09-07 16:09:58 -03:00
Rafael Miller
6aae67bd8c
feat(sdk): added agent option (#2108)
* feat(sdk): added agent option

* Update apps/python-sdk/firecrawl/v2/methods/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-09-05 18:21:40 -03:00
devin-ai-integration[bot]
9e6fc1b54d
Update Type Annotations for v2 Async Search (SearchResponse → SearchData) (#2097)
* Update v2 async search type annotations from SearchResponse to SearchData

- Remove SearchResponse export from firecrawl.types for v2 usage
- Aligns type annotations with actual runtime behavior
- v2 async search methods already return SearchData directly
- v1 methods continue to use SearchResponse as expected
- Resolves Linear ticket ENG-3321

Co-Authored-By: rafael@sideguide.dev <rafael@sideguide.dev>

* bump version

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: rafael@sideguide.dev <rafael@sideguide.dev>
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
2025-09-05 09:37:16 -03:00
Rafael Miller
a2517a8855
Feat(sdks): integration param (#2096)
* feat(sdks): integration param

* added underline to integration param in sdk tests

* removed param that didn't make sense

* chore(sdks): bump versions

* Update apps/js-sdk/firecrawl/src/v2/methods/search.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/crawl.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/batch.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* cubic's review

* Update apps/python-sdk/firecrawl/v2/methods/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/extract.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/js-sdk/firecrawl/src/v2/methods/crawl.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/utils/validation.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/search.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/search.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/map.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/aio/search.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* cubics fixes

* Update apps/js-sdk/firecrawl/src/v2/methods/batch.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* here we go cubic

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-09-03 16:39:43 -03:00
Rafael Miller
471feacb22
feat(python-sdk): normalize docs in search results (#2098) 2025-09-03 16:00:50 -03:00
tom
9a3bd6ca50
Add proxy location support to crawl and map endpoints (ENG-3361) (#2092) 2025-09-03 16:19:00 +02:00
Rafael Miller
3968a602fe
fix(python-sdk): added missing get_queue_status in aio and added to t… (#2081)
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-09-01 17:44:23 +02:00
Gergő Móricz
90778e4604
feat: historical credit/token usage endpoints + more data in existing usage endpoints (#2077) 2025-09-01 13:23:38 +02:00
Gergő Móricz
76cc2decd0
feat(api): add /team/queue-status endpoint (#2063)
* feat(api): add /team/queue-status endpoint

* chore: bump SDKs

* fix bad imports

* various fixes (ty cubic)

* rebase fix
2025-09-01 11:03:05 +02:00
Nicolas
b05327dbbe Nick: fix py sdk validation error 2025-08-30 18:41:47 -04:00
devin-ai-integration[bot]
7bea613ec0
feat: add maxPages parameter to PDF parser in v2 scrape API (#2047)
* feat: add maxPages parameter to PDF parser

- Extend parsersSchema to support both string array ['pdf'] and object array [{'type':'pdf','maxPages':10}] formats
- Add shouldParsePDF and getPDFMaxPages helper functions for consistent parser handling
- Update PDF processing to respect maxPages limit in both RunPod MU and PdfParse processors
- Modify billing calculation to use actual pages processed instead of total pages
- Add comprehensive tests for object format parsers, page limiting, and validation
- Maintain backward compatibility with existing string array format

The maxPages parameter is optional and defaults to unlimited when not specified.
Page limiting occurs before processing to avoid unnecessary computation and billing
is based on the effective page count for fairness.

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: correct parsersSchema to handle individual parser items

- Change union from array-level to item-level in parsersSchema
- Now accepts array where each item is either string 'pdf' or object {'type':'pdf','maxPages':10}
- When parser is string 'pdf', maxPages is undefined (no limit)
- When parser is object, use specified maxPages value
- Maintains backward compatibility with existing ['pdf'] format

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: remove maxPages logic from scrapePDFWithParsePDF per PR feedback

- Remove maxPages parameter and truncation logic from scrapePDFWithParsePDF
- Keep maxPages logic only in scrapePDFWithRunPodMU where it provides cost savings
- Addresses feedback from mogery: pdf-parse doesn't cost anything extra to process all pages

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* test: add maxPages parameter tests for crawl and search endpoints

- Add crawl endpoint test with PDF maxPages parameter
- Add search endpoint test with PDF maxPages parameter
- Verify maxPages works end-to-end across all endpoints (scrape, crawl, search)
- Ensure schema inheritance and data flow work correctly

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: remove problematic crawl and search tests for maxPages

- Remove crawl test that incorrectly uses direct PDF URL
- Remove search test that relies on unreliable external search results
- maxPages functionality verified through schema inheritance and data flow analysis
- Comprehensive tests already exist in parsers.test.ts for core functionality

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* feat: add maxPages parameter support to Python and JavaScript SDKs

- Add PDFParser class to Python SDK with max_pages field validation (1-1000)
- Update Python SDK parsers field to support Union[List[str], List[Union[str, PDFParser]]]
- Add parsers preprocessing in Python SDK to convert snake_case to camelCase
- Update JavaScript SDK parsers type to Array<string | { type: 'pdf'; maxPages?: number }>
- Add maxPages validation to JavaScript SDK ensureValidScrapeOptions
- Maintain backward compatibility with existing ['pdf'] string array format
- Support mixed formats in both SDKs
- Add comprehensive test files for both SDKs

Addresses GitHub comment requesting SDK support for maxPages parameter.

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* cleanup: remove temporary test files

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: correct parsers schema to support mixed string and object arrays

- Fix parsers schema to properly handle mixed arrays like ['pdf', {type: 'pdf', maxPages: 5}]
- Resolves backward compatibility issue that was causing webhook test failures
- All parser formats now work: ['pdf'], [{type: 'pdf'}], [{type: 'pdf', maxPages: 10}], mixed arrays

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* Delete SDK_MAXPAGES_IMPLEMENTATION.md

* feat: increase maxPages limit from 1000 to 10000 pages

- Update backend Zod schema validation in types.ts
- Update JavaScript SDK client-side validation
- Update API test cases to use new 10000 limit
- Addresses GitHub comment feedback from nickscamara

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: update Python SDK maxPages limit from 1000 to 10000

- Fix validation discrepancy between Python SDK (1000) and backend/JS SDK (10000)
- Ensures consistent maxPages validation across all SDKs
- Addresses critical bug identified in PR review

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* fix: remove SDK-side maxPages validation per PR feedback

- Remove maxPages range validation from JavaScript SDK validation.ts
- Remove maxPages range validation from Python SDK types.py
- Keep backend API validation as single source of truth
- Addresses GitHub comment from mogery

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* Nick:

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: thomas@sideguide.dev <thomas@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-08-29 20:13:58 -04:00
Rafael Miller
815963890f
feat(sdks): next cursor pagination (#2067)
* feat(sdks): next cursor pagination

- Default auto-pagination enabled; pass { autoPaginate: false } (JS) or PaginationConfig(auto_paginate=False) (Python) to restore single-page behavior.
- Potentially larger responses and fewer calls by default.

* good to go

* bump sdks version

* fixed tests and endpoints

* addressed cubic's appointments

* docs

* cubic's appointments

* Update apps/js-sdk/firecrawl/src/v2/methods/crawl.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* here we go

* Update apps/python-sdk/firecrawl/v2/utils/http_client.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/v2/methods/crawl.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/python-sdk/firecrawl/__tests__/unit/v2/methods/test_pagination.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* rafa:

* Update apps/python-sdk/firecrawl/v2/methods/crawl.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update example_pagination.py

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-08-29 20:01:51 -03:00
rafaelmmiller
293b532629 chore(sdks): bumped sdks 2025-08-27 11:23:13 -03:00
Rafael Miller
7ac607617f
fix(python-sdk): missing methods in client (#2050)
added get_active_crawls and start_extract methods that were missing in the sync client
2025-08-27 08:09:12 -03:00
Vishnu Krishnan
0589994ec6
feat(api): support extraction of data-* attributes in scrape endpoints (#2006)
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-08-27 09:53:11 +02:00
Vishnu Krishnan
30c6bdd938
feat(api): add image extraction support to v2 scrape endpoint (#2008) 2025-08-27 09:50:06 +02:00
Nicolas
54d8d92c99 Nick: sdks now have search categories 2025-08-23 16:59:39 -07:00
Nicolas
44715278b3 Nick: updated sdks with search categories 2025-08-23 16:47:43 -07:00
Nicolas
f3c73d00c0
(feat/search) Search Categories (#2019)
* Nick: search categories

* Nick:

* Update apps/api/src/lib/search-query-builder.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update apps/api/src/lib/search-query-builder.ts

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* Update types.ts

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-08-23 16:28:56 -07:00
rafaelmmiller
46567088f4 (sdks): fix unit tests 2025-08-22 17:09:04 -03:00
devin-ai-integration[bot]
85a2402dde
fix: add explicit pydantic>=2.0 dependency requirement (#2010)
- Add pydantic>=2.0 constraint to pyproject.toml, setup.py, and requirements.txt
- Bump version from 3.2.0 to 3.2.1
- Fixes ImportError when importing field_validator from pydantic v2
- Resolves issue where users with pydantic v1 would get import errors

Fixes: https://linear.app/firecrawl/issue/ENG-3267

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: rafael@sideguide.dev <rafael@sideguide.dev>
2025-08-22 15:31:14 -03:00
Rafael Miller
61c3abcb5b
(python-sdk): fixed search types + added max_keepalive_connections to fix event loop is closed bug (#2005) 2025-08-22 15:29:47 +02:00
Micah Stairs
ac278ccc1b
Fix search validation of custom date ranges and "qdr:h" (#1993)
* Fix search validation to support custom date ranges

- Add regex import to search.py
- Update _validate_search_request to support cdr:1,cd_min:MM/DD/YYYY,cd_max:MM/DD/YYYY format
- Add missing qdr:h (hourly) predefined value
- Maintain backward compatibility with existing predefined values

Co-Authored-By: Micah Stairs <micah.stairs@gmail.com>

* Add verification script for custom date range validation

- Comprehensive test script that validates the fix works correctly
- Tests both valid and invalid custom date range formats
- Verifies backward compatibility with predefined values
- All tests pass confirming the fix is working

Co-Authored-By: Micah Stairs <micah.stairs@gmail.com>

* Add test cases for custom date range validation

- Add test_validate_custom_date_ranges for valid custom date formats
- Add test_validate_invalid_custom_date_ranges for invalid formats
- Include test_custom_date_range.py for additional verification
- Ensure comprehensive test coverage for the validation fix

Co-Authored-By: Micah Stairs <micah.stairs@gmail.com>

* Fix tests

Co-Authored-By: Micah Stairs <micah.stairs@gmail.com>

* version bump

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Rafael Miller <150964962+rafaelsideguide@users.noreply.github.com>
2025-08-22 09:23:31 -03:00
Rafael Miller
f5d901b9d9
(python-sdk)fix: added normalizer for document.metadata, also added m… (#1986)
* (python-sdk)fix: added normalizer for document.metadata, also added metadata_dict and metadata_typed properties

* address cubic's comment

* Update __init__.py

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-08-20 18:31:38 -07:00
Nicolas
e3eebfb7db Update __init__.py 2025-08-18 21:53:15 -07:00
rafaelmmiller
99dcc36e6e (sdks): fixed origin, batch aio and tests 2025-08-18 16:14:20 -03:00
Gergő Móricz
2f3bc4e7a7 mendableai -> firecrawl 2025-08-18 20:46:41 +02:00
Nicolas
4ef24355f7 Create example_v2.py 2025-08-17 20:13:56 -07:00
rafaelmmiller
0e3b9d2ffb fix python types + json/pydantic 2025-08-17 18:45:35 -03:00
Nicolas
6849d938c4 Nick: FirecrawlApp compatible 2025-08-17 14:15:50 -07:00
rafaelmmiller
c4c2bbd803 delete cache 2025-08-14 09:22:18 -03:00
rafaelmmiller
55e8d443fd (sdks): added summary, fixed usage tests and v2 client for python 2025-08-14 09:21:54 -03:00
Gergő Móricz
e251516a8e fix usage stuff 2025-08-13 20:07:02 +02:00
Gergő Móricz
c1700e06c6 fix in python sdk 2025-08-13 19:37:03 +02:00
Gergő Móricz
356b04fb65 further crawl-errors improvements 2025-08-13 17:49:29 +02:00
rafaelmmiller
537f6c4ec0 (python/js/ts-sdks): readmes, e2e tests w idmux etc all good 2025-08-12 18:00:37 -03:00
rafaelmmiller
d44baed8f2 (js-sdk): mostly done 2025-08-12 13:50:52 -03:00
rafaelmmiller
ec69c30992 (js-sdk): methods done. todo: e2e/unit tests 2025-08-12 10:25:41 -03:00
rafaelmmiller
10b7202898 (python-sdk): extract v2 2025-08-11 16:35:49 -03:00
rafaelmmiller
d31d39d664 (python-sdk): batch, map, ws improv and aio methods. e2e tests done. 2025-08-11 14:29:26 -03:00
Gergő Móricz
21cbfec398 Merge branch 'main' into nsc/v2 2025-08-10 18:25:08 +02:00
rafaelmmiller
dd6b46d373 (python-sdk): removed client duplication, bunch of type fixing, added map method + e2e/unit tests 2025-08-08 11:56:05 -03:00
rafaelmmiller
08c8f42091 (python-sdk): scrape is done! 2025-08-07 18:10:32 -03:00
rafaelmmiller
d2b325f815 (python-sdk): get_crawl_errors and active_crawls, got rid of useless tests 2025-08-07 15:52:07 -03:00
rafaelmmiller
7a85b9f433 (python-sdk): crawl done 2025-08-07 11:18:07 -03:00