firecrawl

mirror of https://github.com/mendableai/firecrawl.git synced 2026-01-01 09:44:33 +00:00

Author	SHA1	Message	Date
Abimael Martell	e4eb79f16c	python-sdk: Update Agent Client (#2579 ) * python-sdk: Update Agent Client * bump version * cr comment * validate schema type	2025-12-18 21:38:57 -08:00
Gergő Móricz	e1e9c38a7d	feat(api): a/sctu	2025-12-17 22:21:28 +01:00
Gergő Móricz	6a2425b776	feat(api): a/mc	2025-12-17 20:24:11 +01:00
Gergő Móricz	0a5cc475ff	feat(api): ab/c	2025-12-16 21:12:27 +01:00
Gergő Móricz	2efd0ae225	feat: ab	2025-12-14 22:37:28 +01:00
Gergő Móricz	419b80a4c7	feat(sdk): e3	2025-12-13 16:43:06 +01:00
Rafael Miller	02d9b5fe0c	(python-sdk)fix/max_pages (#2527 ) * (python-sdk)fix/max_pages * bump version	2025-12-09 15:09:54 -03:00
Rafael Miller	aecce165b5	added timezone to metadata response (#2526 ) * added timezone to metadata response * removed hallucination tests	2025-12-09 13:50:02 -03:00
rafaelmmiller	7e2328ad1d	chore: bump sdk versions	2025-12-07 14:57:14 -03:00
Rafael Miller	d496714d44	(sdks)feat/added concurrency info to metadata (#2502 )	2025-12-07 14:48:29 -03:00
Rafael Miller	3d418df1bc	(sdk)fix/same timeout as api now (#2503 )	2025-12-07 14:48:14 -03:00
Rafael Miller	cdd1e897c1	Python sdk fix/batch validate limit (#2399 ) * (python-sdk) fix: removed 1000 url limit from batch scrape validator * fixed tests * Update apps/python-sdk/firecrawl/v2/methods/aio/batch.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/aio/batch.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> --------- Co-authored-by: Micah Stairs <micah.stairs@gmail.com> Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>	2025-12-04 18:03:41 -03:00
Gergő Móricz	eebe41623e	feat(extract): port to billing credits (#2482 ) * feat(extract): port to billing credits * added normalization --------- Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>	2025-12-03 19:25:53 +01:00
tom	9f4f011a78	Add minAge parameter to scrape (#2452 ) * feat(api): add minAge parameter to scrape options * test(api): update minAge test to use scrapeRaw for error handling	2025-11-30 20:35:49 +00:00
Gaurav Chadha	08aee42275	fix: Add support for ignoreQueryParameter in map SDKs (#2429 ) * add-support-for-ignoreQueryParameter-map Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com> * Update package.json --------- Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com> Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>	2025-11-26 15:47:11 -03:00
Rafael Miller	4c860515df	(python-sdk) feat: added extra fields to metadata (#2441 ) * (python-sdk) feat: added extra fields to metadata * Update apps/python-sdk/example.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * fixes metadata coerce function for unknown-keys * Update types.py --------- Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>	2025-11-26 10:41:42 -03:00
rafaelmmiller	881f566bf9	fixed json input on python-sdk	2025-11-12 13:47:16 -03:00
rafaelmmiller	d4b7472397	bump python-sdk version	2025-11-12 11:52:49 -03:00
Neha Prasad	cc411b6a3b	fix: image search field mapping in Python SDK (#2244 ) * added field mapping function * applied field normalization to search results	2025-11-12 11:51:22 -03:00
Gaurav Chadha	8fd980f343	update: Adds support for recursive schema for `python-sdk` with model selection (#2266 ) * add-support-for-recursive-schema-python-sdk rebase Signed-off-by: Gaurav Chadha <chadha93@192.168.1.13> * remove-weak-map Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com> * Update apps/python-sdk/firecrawl/v1/client.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * handle-circular-ref Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com> * added unit tests. added some ignores to config for WIPs --------- Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com> Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>	2025-11-12 11:17:45 -03:00
devin-ai-integration[bot]	dabf991ca9	Add branding format support to JS and Python SDKs (#2360 ) * Add branding format support to JS and Python SDKs Co-Authored-By: abi@sideguide.dev <abimex@gmail.com> * Add comprehensive unit tests for branding format in both SDKs - Add JS SDK unit tests in branding.test.ts with 4 test cases - Add Python SDK unit tests in test_branding.py with 5 test cases - Update Python SDK normalize_document_input to handle colorScheme -> color_scheme conversion - Add model_config extra='allow' to BrandingProfile for future extensibility - All tests pass locally (25 JS tests, 5 Python tests) Co-Authored-By: abi@sideguide.dev <abimex@gmail.com> * Bump SDK versions for branding format release - Bump JS SDK version from 4.4.1 to 4.5.0 - Bump Python SDK version from 4.5.0 to 4.6.0 Version bumps reflect the addition of branding format support and comprehensive unit tests. Co-Authored-By: abi@sideguide.dev <abimex@gmail.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: abi@sideguide.dev <abimex@gmail.com>	2025-11-04 16:41:29 -08:00
Abimael Martell	ee562fc2c7	Merge pull request #2290 from firecrawl/fix-python-sdk-req-key python-sdk: Don't require API Key when running Self Hosted	2025-10-17 08:17:38 -07:00
Abimael Martell	69f7e588be	better documentation	2025-10-17 00:17:46 -07:00
Abimael Martell	61b3c71963	increase feat version	2025-10-16 15:30:11 -07:00
Abimael Martell	7240c41d9a	cr comments	2025-10-16 08:41:22 -07:00
Abimael Martell	8c2189f557	add default None to api key	2025-10-16 08:28:49 -07:00
Abimael Martell	26d65c87bf	python-sdk: Don't require API Key when running Self Hosted	2025-10-15 16:23:16 -07:00
Abimael Martell	66abf0f8d0	python-sdk: Fix timeout handling across api calls	2025-10-15 15:46:31 -07:00
Nicolas	8a3936fdc0	Nick: pdf search category	2025-10-13 11:09:10 -03:00
devin-ai-integration[bot]	0b8d87caf0	chore(python-sdk): bump version to 4.3.7 for poll_interval fix (#2265 ) Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: gaurav@sideguide.dev <gauravchadha1676@gmail.com>	2025-10-10 12:44:09 -03:00
Jeel Rupareliya	57babbaf09	python-sdk: include "cancelled" in CrawlJob.status and exit wait loop on cancel (fixes #2190 ) (#2240 ) * cancelled added in stats if job is cancelled * chore: stop tracking local venv in apps/python-sdk/.venv * revert: exclude local docker-compose port mapping change from PR * chore: ignore local SDK venv and remove from tracking * redis rate limit url added * removed redis rate limit * Update crawl.py	2025-10-01 12:32:52 -03:00
Gaurav Chadha	e661bd2b7c	fix: add missing `poll_interval` param in watcher (#2155 ) * add-poll_interval-param-in-watcher Signed-off-by: Chadha93 <gauravchadha1676@gmail.com> * add-guard-for-non-negative-values Signed-off-by: Chadha93 <gauravchadha1676@gmail.com> --------- Signed-off-by: Chadha93 <gauravchadha1676@gmail.com>	2025-09-22 17:10:05 -03:00
Nicolas	18c4b13b22	Nick: fixed integrations bug in search method	2025-09-07 16:09:58 -03:00
Rafael Miller	6aae67bd8c	feat(sdk): added agent option (#2108 ) * feat(sdk): added agent option * Update apps/python-sdk/firecrawl/v2/methods/extract.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> --------- Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>	2025-09-05 18:21:40 -03:00
devin-ai-integration[bot]	9e6fc1b54d	Update Type Annotations for v2 Async Search (SearchResponse → SearchData) (#2097 ) * Update v2 async search type annotations from SearchResponse to SearchData - Remove SearchResponse export from firecrawl.types for v2 usage - Aligns type annotations with actual runtime behavior - v2 async search methods already return SearchData directly - v1 methods continue to use SearchResponse as expected - Resolves Linear ticket ENG-3321 Co-Authored-By: rafael@sideguide.dev <rafael@sideguide.dev> * bump version --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: rafael@sideguide.dev <rafael@sideguide.dev> Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>	2025-09-05 09:37:16 -03:00
Rafael Miller	a2517a8855	Feat(sdks): integration param (#2096 ) * feat(sdks): integration param * added underline to integration param in sdk tests * removed param that didn't make sense * chore(sdks): bump versions * Update apps/js-sdk/firecrawl/src/v2/methods/search.ts Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/extract.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/aio/extract.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/aio/crawl.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/aio/batch.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * cubic's review * Update apps/python-sdk/firecrawl/v2/methods/extract.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/aio/extract.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/js-sdk/firecrawl/src/v2/methods/crawl.ts Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/utils/validation.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/search.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/search.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/map.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/aio/search.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * cubics fixes * Update apps/js-sdk/firecrawl/src/v2/methods/batch.ts Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * here we go cubic --------- Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>	2025-09-03 16:39:43 -03:00
Rafael Miller	471feacb22	feat(python-sdk): normalize docs in search results (#2098 )	2025-09-03 16:00:50 -03:00
tom	9a3bd6ca50	Add proxy location support to crawl and map endpoints (ENG-3361) (#2092 )	2025-09-03 16:19:00 +02:00
Rafael Miller	3968a602fe	fix(python-sdk): added missing get_queue_status in aio and added to t… (#2081 ) Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>	2025-09-01 17:44:23 +02:00
Gergő Móricz	90778e4604	feat: historical credit/token usage endpoints + more data in existing usage endpoints (#2077 )	2025-09-01 13:23:38 +02:00
Gergő Móricz	76cc2decd0	feat(api): add /team/queue-status endpoint (#2063 ) * feat(api): add /team/queue-status endpoint * chore: bump SDKs * fix bad imports * various fixes (ty cubic) * rebase fix	2025-09-01 11:03:05 +02:00
Nicolas	b05327dbbe	Nick: fix py sdk validation error	2025-08-30 18:41:47 -04:00
devin-ai-integration[bot]	7bea613ec0	feat: add maxPages parameter to PDF parser in v2 scrape API (#2047 ) * feat: add maxPages parameter to PDF parser - Extend parsersSchema to support both string array ['pdf'] and object array [{'type':'pdf','maxPages':10}] formats - Add shouldParsePDF and getPDFMaxPages helper functions for consistent parser handling - Update PDF processing to respect maxPages limit in both RunPod MU and PdfParse processors - Modify billing calculation to use actual pages processed instead of total pages - Add comprehensive tests for object format parsers, page limiting, and validation - Maintain backward compatibility with existing string array format The maxPages parameter is optional and defaults to unlimited when not specified. Page limiting occurs before processing to avoid unnecessary computation and billing is based on the effective page count for fairness. Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * fix: correct parsersSchema to handle individual parser items - Change union from array-level to item-level in parsersSchema - Now accepts array where each item is either string 'pdf' or object {'type':'pdf','maxPages':10} - When parser is string 'pdf', maxPages is undefined (no limit) - When parser is object, use specified maxPages value - Maintains backward compatibility with existing ['pdf'] format Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * fix: remove maxPages logic from scrapePDFWithParsePDF per PR feedback - Remove maxPages parameter and truncation logic from scrapePDFWithParsePDF - Keep maxPages logic only in scrapePDFWithRunPodMU where it provides cost savings - Addresses feedback from mogery: pdf-parse doesn't cost anything extra to process all pages Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * test: add maxPages parameter tests for crawl and search endpoints - Add crawl endpoint test with PDF maxPages parameter - Add search endpoint test with PDF maxPages parameter - Verify maxPages works end-to-end across all endpoints (scrape, crawl, search) - Ensure schema inheritance and data flow work correctly Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * fix: remove problematic crawl and search tests for maxPages - Remove crawl test that incorrectly uses direct PDF URL - Remove search test that relies on unreliable external search results - maxPages functionality verified through schema inheritance and data flow analysis - Comprehensive tests already exist in parsers.test.ts for core functionality Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * feat: add maxPages parameter support to Python and JavaScript SDKs - Add PDFParser class to Python SDK with max_pages field validation (1-1000) - Update Python SDK parsers field to support Union[List[str], List[Union[str, PDFParser]]] - Add parsers preprocessing in Python SDK to convert snake_case to camelCase - Update JavaScript SDK parsers type to Array<string \| { type: 'pdf'; maxPages?: number }> - Add maxPages validation to JavaScript SDK ensureValidScrapeOptions - Maintain backward compatibility with existing ['pdf'] string array format - Support mixed formats in both SDKs - Add comprehensive test files for both SDKs Addresses GitHub comment requesting SDK support for maxPages parameter. Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * cleanup: remove temporary test files Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * fix: correct parsers schema to support mixed string and object arrays - Fix parsers schema to properly handle mixed arrays like ['pdf', {type: 'pdf', maxPages: 5}] - Resolves backward compatibility issue that was causing webhook test failures - All parser formats now work: ['pdf'], [{type: 'pdf'}], [{type: 'pdf', maxPages: 10}], mixed arrays Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * Delete SDK_MAXPAGES_IMPLEMENTATION.md * feat: increase maxPages limit from 1000 to 10000 pages - Update backend Zod schema validation in types.ts - Update JavaScript SDK client-side validation - Update API test cases to use new 10000 limit - Addresses GitHub comment feedback from nickscamara Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * fix: update Python SDK maxPages limit from 1000 to 10000 - Fix validation discrepancy between Python SDK (1000) and backend/JS SDK (10000) - Ensures consistent maxPages validation across all SDKs - Addresses critical bug identified in PR review Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * fix: remove SDK-side maxPages validation per PR feedback - Remove maxPages range validation from JavaScript SDK validation.ts - Remove maxPages range validation from Python SDK types.py - Keep backend API validation as single source of truth - Addresses GitHub comment from mogery Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev> * Nick: --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: thomas@sideguide.dev <thomas@sideguide.dev> Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2025-08-29 20:13:58 -04:00
Rafael Miller	815963890f	feat(sdks): next cursor pagination (#2067 ) * feat(sdks): next cursor pagination - Default auto-pagination enabled; pass { autoPaginate: false } (JS) or PaginationConfig(auto_paginate=False) (Python) to restore single-page behavior. - Potentially larger responses and fewer calls by default. * good to go * bump sdks version * fixed tests and endpoints * addressed cubic's appointments * docs * cubic's appointments * Update apps/js-sdk/firecrawl/src/v2/methods/crawl.ts Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * here we go * Update apps/python-sdk/firecrawl/v2/utils/http_client.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/v2/methods/crawl.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update apps/python-sdk/firecrawl/__tests__/unit/v2/methods/test_pagination.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * rafa: * Update apps/python-sdk/firecrawl/v2/methods/crawl.py Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update example_pagination.py --------- Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>	2025-08-29 20:01:51 -03:00
rafaelmmiller	293b532629	chore(sdks): bumped sdks	2025-08-27 11:23:13 -03:00
Rafael Miller	7ac607617f	fix(python-sdk): missing methods in client (#2050 ) added get_active_crawls and start_extract methods that were missing in the sync client	2025-08-27 08:09:12 -03:00
Vishnu Krishnan	0589994ec6	feat(api): support extraction of data-* attributes in scrape endpoints (#2006 ) Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>	2025-08-27 09:53:11 +02:00
Vishnu Krishnan	30c6bdd938	feat(api): add image extraction support to v2 scrape endpoint (#2008 )	2025-08-27 09:50:06 +02:00
Nicolas	54d8d92c99	Nick: sdks now have search categories	2025-08-23 16:59:39 -07:00
Nicolas	44715278b3	Nick: updated sdks with search categories	2025-08-23 16:47:43 -07:00

1 2 3 4 5 ...

326 Commits