268 Commits

Author SHA1 Message Date
Gergő Móricz
2f3bc4e7a7 mendableai -> firecrawl 2025-08-18 20:46:41 +02:00
Nicolas
4ef24355f7 Create example_v2.py 2025-08-17 20:13:56 -07:00
rafaelmmiller
0e3b9d2ffb fix python types + json/pydantic 2025-08-17 18:45:35 -03:00
Nicolas
6849d938c4 Nick: FirecrawlApp compatible 2025-08-17 14:15:50 -07:00
rafaelmmiller
c4c2bbd803 delete cache 2025-08-14 09:22:18 -03:00
rafaelmmiller
55e8d443fd (sdks): added summary, fixed usage tests and v2 client for python 2025-08-14 09:21:54 -03:00
Gergő Móricz
e251516a8e fix usage stuff 2025-08-13 20:07:02 +02:00
Gergő Móricz
c1700e06c6 fix in python sdk 2025-08-13 19:37:03 +02:00
Gergő Móricz
356b04fb65 further crawl-errors improvements 2025-08-13 17:49:29 +02:00
rafaelmmiller
537f6c4ec0 (python/js/ts-sdks): readmes, e2e tests w idmux etc all good 2025-08-12 18:00:37 -03:00
rafaelmmiller
d44baed8f2 (js-sdk): mostly done 2025-08-12 13:50:52 -03:00
rafaelmmiller
ec69c30992 (js-sdk): methods done. todo: e2e/unit tests 2025-08-12 10:25:41 -03:00
rafaelmmiller
10b7202898 (python-sdk): extract v2 2025-08-11 16:35:49 -03:00
rafaelmmiller
d31d39d664 (python-sdk): batch, map, ws improv and aio methods. e2e tests done. 2025-08-11 14:29:26 -03:00
Gergő Móricz
21cbfec398 Merge branch 'main' into nsc/v2 2025-08-10 18:25:08 +02:00
rafaelmmiller
dd6b46d373 (python-sdk): removed client duplication, bunch of type fixing, added map method + e2e/unit tests 2025-08-08 11:56:05 -03:00
rafaelmmiller
08c8f42091 (python-sdk): scrape is done! 2025-08-07 18:10:32 -03:00
rafaelmmiller
d2b325f815 (python-sdk): get_crawl_errors and active_crawls, got rid of useless tests 2025-08-07 15:52:07 -03:00
rafaelmmiller
7a85b9f433 (python-sdk): crawl done 2025-08-07 11:18:07 -03:00
rafaelmmiller
631dc981e3 (python-sdk): wip - crawl endpoints
few tests failing yet
2025-08-06 18:41:54 -03:00
devin-ai-integration[bot]
71829dbde3
feat(python-sdk): add agent parameter support to scrape_url method (#1919)
* feat(python-sdk): add agent parameter support to scrape_url method

- Add agent parameter to FirecrawlApp.scrape_url method signature
- Add agent parameter to AsyncFirecrawlApp.scrape_url method signature
- Update validation logic to allow agent parameter for scrape_url
- Add agent parameter handling following batch methods pattern
- Add test case for agent parameter functionality

Resolves issue where agent parameter was only supported in batch methods

Co-Authored-By: Micah Stairs <micah.stairs@gmail.com>

* chore: revert test file changes and bump version to 2.16.5

- Remove test case for agent parameter from test file
- Bump Python SDK version from 2.16.4 to 2.16.5

Co-Authored-By: Micah Stairs <micah.stairs@gmail.com>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah.stairs@gmail.com>
2025-08-06 15:34:56 -04:00
rafaelmmiller
f22fba6295 (python-sdk): wip - base structure and search endpoint 2025-08-05 17:41:12 -03:00
Gergő Móricz
7400e10eac
feat(v2): parsers + merge w/ main (#1907)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah.stairs@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mogery <mo.gery@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Rafael Miller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
Co-authored-by: Chetan Goti <chetan.goti7@gmail.com>
fix: improve robots.txt HTML filtering to check content structure (#1880)
fix(html-to-markdown): reinitialize converter lib for every conversion (#1872)"
fix(go): add mutex to prevent concurrent access issues in html-to-markdown (#1883)
fix(js-sdk): add retry logic for socket hang up errors in monitorJobStatus (ENG-3029) (#1893)
fix(go): add mutex to prevent concurrent access issues in html-to-markdown (#1883)"
Fix Pydantic field name shadowing issues causing import NameError (#1800)
fix(crawl-redis): attempt to cleanup crawl memory post finish (#1901)
fix(docs): correct link to Map (#1904)
2025-08-01 20:36:32 +02:00
Rafael Miller
4c0234079c
Improve error handling in Python SDK for non-JSON responses (#1827)
* Improve error handling in Python SDK for non-JSON responses

- Enhanced _handle_error method to gracefully handle non-JSON server responses
- Added fallback error messages when JSON parsing fails
- Improved error details with response content preview (limited to 500 chars)
- Fixed return statement bug in _get_error_message for 403 status code
- Better user experience when server returns HTML error pages or empty responses

* version bump

* Update apps/python-sdk/firecrawl/firecrawl.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-07-31 15:56:02 -03:00
devin-ai-integration[bot]
269c7097cb
feat(python-sdk): implement missing crawl_entire_domain parameter (#1896)
* feat(python-sdk): implement missing crawl_entire_domain parameter

- Add crawlEntireDomain field to CrawlParams schema
- Fix crawl_url_and_watch to pass crawl_entire_domain parameter
- Add test for crawl_entire_domain functionality

The parameter was already implemented in most places but was missing:
1. crawlEntireDomain field in the CrawlParams schema
2. crawl_entire_domain parameter passing in crawl_url_and_watch method

Co-Authored-By: rafael@sideguide.dev <rafael@sideguide.dev>

* fix(python-sdk): add missing SearchParams import in test file

Co-Authored-By: rafael@sideguide.dev <rafael@sideguide.dev>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: rafael@sideguide.dev <rafael@sideguide.dev>
2025-07-31 15:55:20 -03:00
devin-ai-integration[bot]
21103b5a58
fix: convert timeout from milliseconds to seconds in Python SDK (#1894)
* fix: convert timeout from milliseconds to seconds in Python SDK

- Fix timeout conversion in scrape_url method (line 596)
- Fix timeout conversion in _post_request method (line 2207)
- Add comprehensive tests for timeout functionality
- Resolves issue #1848

The Python SDK was incorrectly passing timeout values in milliseconds
directly to requests.post() which expects seconds, causing timeouts
to be 1000x longer than intended (e.g. 60s became 16.6 hours).

Co-Authored-By: rafael@sideguide.dev <rafael@sideguide.dev>

* fix: handle timeout=0 edge case in conversion logic

- Change condition from 'if timeout' to 'if timeout is not None'
- Ensures timeout=0 is converted to 5.0 seconds instead of None
- All timeout conversion tests now pass (5/5)

Co-Authored-By: rafael@sideguide.dev <rafael@sideguide.dev>

* feat: change default timeout from None to 30s (30000ms)

- Update all timeout parameter defaults from None to 30000ms across SDK
- ScrapeOptions, MapParams, and all method signatures now default to 30s
- Update tests to verify new default timeout behavior (35s total with 5s buffer)
- Add test for _post_request when no timeout key is present in data
- Maintains backward compatibility for explicit timeout values
- All 6 timeout conversion tests pass

Co-Authored-By: rafael@sideguide.dev <rafael@sideguide.dev>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: rafael@sideguide.dev <rafael@sideguide.dev>
2025-07-31 15:55:00 -03:00
devin-ai-integration[bot]
a7aa0cb2f4
Fix Pydantic field name shadowing issues causing import NameError (#1800) 2025-07-30 16:00:25 -03:00
devin-ai-integration[bot]
26926e56e7
fix(python-sdk): add max_age parameter to scrape_url validation (#1825)
* fix(python-sdk): add max_age parameter to scrape_url validation

- Add max_age to allowed parameters in _validate_kwargs for scrape_url method
- Add comprehensive tests for max_age parameter validation
- Fixes issue where max_age parameter was implemented but not validated properly

The max_age parameter is used for caching and speeding up scrapes. It was already
implemented in the scrape_url method and converted to maxAge in API requests,
but was missing from the validation whitelist in _validate_kwargs method.

Co-Authored-By: Micah Stairs <micah@sideguide.dev>

* Remove test file as requested - keep only core validation fix

Co-Authored-By: Micah Stairs <micah@sideguide.dev>

* fix(python-sdk): add missing validation call to scrape_url method

- Add self._validate_kwargs(kwargs, 'scrape_url') call to main FirecrawlApp.scrape_url method
- Follows the same pattern as all other methods in the codebase
- Addresses PR reviewer feedback that the fix was incomplete

Co-Authored-By: Micah Stairs <micah@sideguide.dev>

* chore(python-sdk): bump version to 2.16.3

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
Co-authored-by: Micah Stairs <micah.stairs@gmail.com>
2025-07-23 09:18:13 -03:00
Rafael Miller
e6f0b1ec16
fixes actions dict attributeError (#1824) 2025-07-22 16:19:45 -03:00
Rafael Miller
a818946eae
sdk-fix: ensure async error handling in AsyncFirecrawlApp methods, update version to 2.16.1 (#1802) 2025-07-15 15:20:16 -03:00
Rafael Miller
ad967d4dbb
[sdk] fixes missing headers param in scrape_url (#1795)
* [sdk] fixes missing headers param in scrape_url

* Update __init__.py

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-07-14 17:59:10 -03:00
Nicolas
87baa23197 Nick: reverting ssl changes to py sdk 2025-07-04 17:57:09 -03:00
Rafael Miller
179737104d
bugfix zero_data_retention param and certifi dependency (#1749) 2025-07-02 14:04:21 -03:00
Ademílson Tonato
30b7e17327
feat(firecrawl): add integration parameter support and enhance kwargs handling 2025-07-02 14:15:39 +01:00
Gergő Móricz
4506b21185
feat(api): zero data retention (ENG-2376) (#1687)
* add ZDR flag and v0 lockout

* zdr walls on v1 and propagation

* zdr within scrapeurl

* fixes

* more fixes

* zdr flag on queue-worker logging

* final stretch + testing, needs f-e changes

* fixes

* self-serve ZDR through request body

* request-level zdr

* improved zdrcleaner

* generalize schema to allow for different data retention times in the future

* update Go version on CI

* feat(api/tests/zdr): test that nothing is logged

* fix(api/tests/zdr): correct log name

* fix(ci): envs

* fix(zdrcleaner): lower bound on db query

* zdr test with idmux

* WIP Assignments

* fix bad merge
remove unused identity

* fix stupid jest globals thing

* feat(scrapeURL/zdr): blacklist pdf action

* fix(concurrency-limit): zdr logging enforcement

* temp: remove extra billing for zdr

* SDK support

* final zdr business logic
fix rename

* fix test log filtering

* fix log filtering... again

* fix(tests/zdr): more logging exceptions

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-07-01 20:07:26 +02:00
Nicolas
3e09f9fb8a
Nick: init (#1740) 2025-07-01 11:39:45 -03:00
Nicolas
17ff8be67b
Nick; (#1726) 2025-06-27 12:02:15 -03:00
Nicolas
caec228f60 Nick: version bump 2025-06-27 11:33:36 -03:00
Gergő Móricz
9ed26e1e07
feat(sdk/python): add pdf action (ENG-2515) (#1722)
* feat(sdk/python): add pdf action
result

* bump

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-06-27 11:26:27 -03:00
Nicolas
d8796e4536
feat: Screenshot quality (#1721)
* Nick: init

* Update index.ts

* Nick: sdks support
2025-06-27 11:07:14 -03:00
devin-ai-integration[bot]
1919799bed
feat(python-sdk): add parsePDF parameter support (#1713)
* feat(python-sdk): add parsePDF parameter support

- Add parsePDF field to ScrapeOptions class for Search API usage
- Add parse_pdf parameter to both sync and async scrape_url methods
- Add parameter handling logic to pass parsePDF to API requests
- Add comprehensive tests for parsePDF functionality
- Maintain backward compatibility with existing API

The parsePDF parameter controls PDF processing behavior:
- When true (default): PDF content extracted and converted to markdown
- When false: PDF returned in base64 encoding with flat credit rate

Resolves missing parsePDF support in Python SDK v2.9.0

Co-Authored-By: Micah Stairs <micah@sideguide.dev>

* Update __init__.py

* Update test.py

* Update __init__.py

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Micah Stairs <micah@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-06-26 16:34:43 +00:00
Nicolas
80f7177473 Nick: bump version 2025-06-20 12:05:15 -03:00
devin-ai-integration[bot]
09aabbedb5
feat: add followInternalLinks parameter as semantic replacement for allowBackwardLinks (#1684)
* feat: add followInternalLinks parameter as semantic replacement for allowBackwardLinks

- Add followInternalLinks parameter to crawl API with same functionality as allowBackwardLinks
- Update transformation logic to use followInternalLinks with precedence over allowBackwardLinks
- Add parameter to Python SDK crawl methods with proper precedence handling
- Add parameter to Node.js SDK CrawlParams interface
- Add comprehensive tests for new parameter and backward compatibility
- Maintain full backward compatibility for existing allowBackwardLinks usage
- Add deprecation notices in documentation while preserving functionality

Co-Authored-By: Nick <nicolascamara29@gmail.com>

* fix: revert accidental cache=True changes to preserve original cache parameter handling

- Revert cache=True back to cache=cache in generate_llms_text methods
- Preserve original parameter passing behavior for cache parameter
- Fix accidental hardcoding of cache parameter to True

Co-Authored-By: Nick <nicolascamara29@gmail.com>

* refactor: rename followInternalLinks to crawlEntireDomain across API, SDKs, and tests

- Rename followInternalLinks parameter to crawlEntireDomain in API schema
- Update Node.js SDK CrawlParams interface to use crawlEntireDomain
- Update Python SDK methods to use crawl_entire_domain parameter
- Update test cases to use new crawlEntireDomain parameter name
- Maintain backward compatibility with allowBackwardLinks
- Update transformation logic to use crawlEntireDomain with precedence

Co-Authored-By: Nick <nicolascamara29@gmail.com>

* fix: add missing cache parameter to generate_llms_text and update documentation references

Co-Authored-By: Nick <nicolascamara29@gmail.com>

* Update apps/python-sdk/firecrawl/firecrawl.py

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nick <nicolascamara29@gmail.com>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-06-20 12:02:23 -03:00
Gergő Móricz
f8983fffb7
Concurrency limit refactor + maxConcurrency parameter (FIR-2191) (#1643) 2025-06-20 10:45:36 +02:00
Nicolas
07b77e1a1e Update __init__.py 2025-06-06 17:23:57 -03:00
Gergő Móricz
4337992636
feat(sdk): Index parameters + other missing parameters (#1638) 2025-06-05 22:22:22 +02:00
Gergő Móricz
3557c90210
feat(js-sdk): auto mode proxy (FIR-2145) (#1602)
* feat(js-sdk): auto mode proxy

* Nick: py sdk

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-05-28 14:31:48 -03:00
Gergő Móricz
513f469b0f
feat(python-sdk/CrawlWatcher): remove max payload size from WebSocket (FIR-2038) (#1577)
* feat(python-sdk/CrawlWatcher): remove max payload size from WebSocket

* Update __init__.py

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-05-20 16:59:08 -03:00
devin-ai-integration[bot]
7ccbbec488
Fix LLMs.txt cache bug with subdomains and add bypass option (#1557)
* Fix LLMs.txt cache bug with subdomains and add bypass option (#1519)

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Nick:

* Update LLMs.txt test file to use helper functions and concurrent tests

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Remove LLMs.txt test file as requested

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Change parameter name to 'cache' and keep 7-day expiration

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Update generate-llmstxt-supabase.ts

* Update JS and Python SDKs to include cache parameter

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Fix LLMs.txt cache implementation to use normalizeUrl and exact matching

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Revert "Fix LLMs.txt cache implementation to use normalizeUrl and exact matching"

This reverts commit d05b9964677b7b2384453329d2ac99d841467053.

* Nick:

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-05-16 16:29:09 -03:00
Nicolas
907cf1cf41 Update __init__.py 2025-05-08 20:29:20 -03:00