firecrawl

mirror of https://github.com/mendableai/firecrawl.git synced 2025-11-06 21:29:34 +00:00

Author	SHA1	Message	Date
Ademílson Tonato	30b7e17327	feat(firecrawl): add integration parameter support and enhance kwargs handling	2025-07-02 14:15:39 +01:00
Nicolas	dcbed186a9	Add local environment configuration to docker-compose services (#1742 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-07-02 00:27:01 +02:00
Gergő Móricz	e3dc2e87db	chore: bump js sdk	2025-07-01 20:29:00 +02:00
Gergő Móricz	4506b21185	feat(api): zero data retention (ENG-2376) (#1687 ) * add ZDR flag and v0 lockout * zdr walls on v1 and propagation * zdr within scrapeurl * fixes * more fixes * zdr flag on queue-worker logging * final stretch + testing, needs f-e changes * fixes * self-serve ZDR through request body * request-level zdr * improved zdrcleaner * generalize schema to allow for different data retention times in the future * update Go version on CI * feat(api/tests/zdr): test that nothing is logged * fix(api/tests/zdr): correct log name * fix(ci): envs * fix(zdrcleaner): lower bound on db query * zdr test with idmux * WIP Assignments * fix bad merge remove unused identity * fix stupid jest globals thing * feat(scrapeURL/zdr): blacklist pdf action * fix(concurrency-limit): zdr logging enforcement * temp: remove extra billing for zdr * SDK support * final zdr business logic fix rename * fix test log filtering * fix log filtering... again * fix(tests/zdr): more logging exceptions --------- Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2025-07-01 20:07:26 +02:00
Gergő Móricz	1f1f733011	fix(map): pass timeout to sitemap fetch (#1741 )	2025-07-01 14:26:43 -03:00
Nicolas	ec298f58b6	Update search.ts	2025-07-01 13:43:00 -03:00
Nicolas	3e09f9fb8a	Nick: init (#1740 )	2025-07-01 11:39:45 -03:00
Gergő Móricz	6c2f432d49	feat(crawl-status): better creditsUsed field (#1738 )	2025-07-01 16:34:30 +02:00
Gergő Móricz	ebf98e3c16	feat(queue-worker): decrease job lock duration to pick up jobs on dead workers faster (#1737 )	2025-07-01 11:27:38 -03:00
Gergő Móricz	400d497fca	feat(scrapeURL): ask user to increase timeout if there's a DOM.getDocument or queryAXTree error (#1739 ) * feat(scrapeURL): ask user to increase timeout if there's a DOM.getDocument or queryAXTree error * fix: move result tracking to meta	2025-07-01 16:25:49 +02:00
devin-ai-integration[bot]	1816cfc4c8	feat: implement IDN support with Punycode encoding (#1735 ) - Update URL validation regex to accept xn-- prefixed domains - Add normalizeHostnameForComparison utility for consistent IDN handling - Update domain comparison functions to use Punycode normalization - Expand test coverage for various IDN scripts (Chinese, Arabic, Russian) - Ensure backward compatibility with existing URL processing Fixes ENG-2510 Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: mogery@sideguide.dev <mogery@sideguide.dev>	2025-07-01 15:39:34 +02:00
Gergő Móricz	8a282e3fc8	fix(auto_charge): bad hourly counter logic (#1736 )	2025-07-01 10:37:37 -03:00
Nicolas	b4eedce3e0	(feat/ledger) Ledger events (#1728 ) * Nick: ledger init * Update email_notification.ts * Update tracking.ts * Nick: removed unused events * Update email_notification.ts * Apply suggestions from code review * Update tracking.ts * Update tracking.ts * Update email_notification.ts * Nick: conc limit ledger --------- Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>	2025-06-30 12:48:06 -03:00
Gergő Móricz	13f012c583	add pdf prefetch log for debugging (ENG-2542) (#1734 ) * feat * feat: pdf prefetch anti-loop error	2025-06-30 17:37:31 +02:00
Gergő Móricz	9162952744	proxy used improvement (#1727 )	2025-06-30 17:37:19 +02:00
Gergő Móricz	9b95a17c0d	fix json format on search (#1729 )	2025-06-30 12:17:52 -03:00
Nicolas	17ff8be67b	Nick; (#1726 ) v1.13.0	2025-06-27 12:02:15 -03:00
Gergő Móricz	57b8e66bc8	feat(api/worker): liveness check in queueing -- don't take jobs when the worker is dying (#1725 )	2025-06-27 11:51:40 -03:00
Nicolas	c4adc687ea	Update index.ts	2025-06-27 11:42:26 -03:00
Nicolas	caec228f60	Nick: version bump	2025-06-27 11:33:36 -03:00
devin-ai-integration[bot]	fa5b96c521	Add parsePDF parameter to JS SDK (#1720 ) * Add parsePDF parameter to JS SDK (clean implementation) - Add parsePDF boolean parameter to CrawlScrapeOptions interface - Parameter automatically flows through scrape and crawl operations via spread operator - Add comprehensive test cases for parsePDF functionality in both scrape and crawl scenarios - Tests verify parsePDF=true and parsePDF=false behavior with PDF files Co-Authored-By: Micah Stairs <micah@sideguide.dev> * Fix parsePDF tests to match actual API behavior - Update parsePDF=false test to expect base64 data instead of markdown - Tests now properly verify the difference between parsePDF=true and parsePDF=false - Address GitHub comment about 'hallucinated' tests by fixing unrealistic expectations Co-Authored-By: Micah Stairs <micah@sideguide.dev> * Update index.test.ts --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Micah Stairs <micah@sideguide.dev> Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2025-06-27 11:30:49 -03:00
devin-ai-integration[bot]	070d1c1d98	Fix unreachable allowSubdomains code in crawler filterURL method (#1719 ) - Move subdomain check logic before external link denial to make it reachable - Add comprehensive tests for allowSubdomains functionality - Ensure subdomain URLs are properly allowed/filtered based on configuration Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Micah Stairs <micah@sideguide.dev>	2025-06-27 11:28:58 -03:00
Gergő Móricz	2b87ea6599	feat: improve DNS resolution error message (#1724 )	2025-06-27 11:27:25 -03:00
Gergő Móricz	55d5c1f41d	feat(scrapeURL/skipTlsVerification): improve error message (#1723 )	2025-06-27 11:27:04 -03:00
Gergő Móricz	9ed26e1e07	feat(sdk/python): add pdf action (ENG-2515) (#1722 ) * feat(sdk/python): add pdf action result * bump --------- Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2025-06-27 11:26:27 -03:00
Nicolas	d8796e4536	feat: Screenshot quality (#1721 ) * Nick: init * Update index.ts * Nick: sdks support	2025-06-27 11:07:14 -03:00
Micah Stairs	9a5d40c3cf	Allow international URLs to pass validation (#1717 )	2025-06-26 13:16:42 -04:00
devin-ai-integration[bot]	1919799bed	feat(python-sdk): add parsePDF parameter support (#1713 ) * feat(python-sdk): add parsePDF parameter support - Add parsePDF field to ScrapeOptions class for Search API usage - Add parse_pdf parameter to both sync and async scrape_url methods - Add parameter handling logic to pass parsePDF to API requests - Add comprehensive tests for parsePDF functionality - Maintain backward compatibility with existing API The parsePDF parameter controls PDF processing behavior: - When true (default): PDF content extracted and converted to markdown - When false: PDF returned in base64 encoding with flat credit rate Resolves missing parsePDF support in Python SDK v2.9.0 Co-Authored-By: Micah Stairs <micah@sideguide.dev> * Update __init__.py * Update test.py * Update __init__.py --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Micah Stairs <micah@sideguide.dev> Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2025-06-26 16:34:43 +00:00
devin-ai-integration[bot]	89e57ace3c	Add temporary exception for Faire team ID to bypass job expiration (#1716 ) * Add temporary exception for Faire team ID to bypass job expiration - Add TEMP_FAIRE_TEAM_ID constant for team f96ad1a4-8102-4b35-9904-36fd517d3616 - Modify job expiration logic to skip 24-hour timeout for this team - Add tests to verify Faire team bypasses expiration and others don't - Temporary solution to allow Faire team access to expired crawl jobs Co-Authored-By: Micah Stairs <micah@sideguide.dev> * Update apps/api/src/__tests__/snips/crawl.test.ts --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Micah Stairs <micah@sideguide.dev> Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>	2025-06-26 13:42:34 +00:00
Gergő Móricz	f4714f4849	fix(js-sdk/extract): use same zod fallback logic (#1711 )	2025-06-25 17:59:58 +00:00
Gergő Móricz	3d04c2087e	fix(api): cached acuc didn't have the is_extract flag set (#1712 ) cosmetic issue only (error message), no behavioural change	2025-06-25 16:43:53 +02:00
Gergő Móricz	bc9065810d	fix(concurrency-limit): overlogging (#1709 )	2025-06-24 17:01:32 +00:00
Gergő Móricz	cc3afa2578	fix(concurrency-limit): scan instead of taking jobs (#1708 )	2025-06-24 13:32:22 -03:00
Gergő Móricz	ae94edd43e	feat(api/ci): idmux (#1707 ) * feat(api/ci): idmux * fix: bad merge * no more default identity * fix change tracking test * fix httpstatus going down lol * fix change tracking tests * bump timeout * fix ct self-hosted * further fixes * one more httpstatus bug * bs * it's being weird, blockAds testing	2025-06-24 15:36:05 +02:00
Gergő Móricz	86603de664	fix(api): instantiate Storage only once (#1706 )	2025-06-24 00:18:07 +02:00
Gergő Móricz	11f469488e	fix(api/batch/scrape): maxConcurrency field support when using ignoreInvalidURLs (#1705 ) * fix(api/batch/scrape): maxConcurrency field support when using ignoreInvalidURLs * fix(tests): timeouts	2025-06-23 21:44:55 +02:00
Gergő Móricz	e7a62dd490	fix(api): pdf bug + testing bugs (#1704 )	2025-06-23 19:57:27 +02:00
Gergő Móricz	fe9057559b	fix(v1): check credits variable scope collision (#1703 ) This is what’s been causing the weird insufficient credits errors.	2025-06-23 19:19:01 +02:00
Gergő Móricz	e3948ae5b1	feat(api): pdf action + housekeeping (#1702 ) * feat(api): pdf action + housekeeping * fix TS build	2025-06-23 19:03:35 +02:00
Ademílson Tonato	78a3579d6e	feat: add relevanceai as part of the integrations	2025-06-23 16:41:19 +01:00
Gergő Móricz	439619ffc6	fix(api/v1/crawl/ongoing): only crawls, no batch scrape (#1701 )	2025-06-23 16:33:02 +02:00
Gergő Móricz	1fdf95913d	feat(api): optimize job count query and improve error handling (#1700 )	2025-06-23 16:18:55 +02:00
Gergő Móricz	c31172493e	fix(api): handle errors better in redis-less crawl status (#1699 )	2025-06-23 15:56:20 +02:00
Gergő Móricz	66cde50a2a	fix(api): enhance error handler with optional ACUC data (#1698 ) Update error handler to use RequestWithMaybeACUC type, allowing access to optional ACUC properties on the request object. Include team_id from ACUC in error logging to improve context for debugging.	2025-06-23 15:42:38 +02:00
Gergő Móricz	e06ec2d047	fix(api): improve error logging with structured error object (#1697 )	2025-06-23 15:26:15 +02:00
Gergő Móricz	7ed19c0ac0	feat(scrapeURL): separate URL rewrites to different function	2025-06-21 02:02:29 +02:00
Gergő Móricz	9174e0c8a0	fix(api): CI (#1692 ) * add scrapeTimeout parameter * fix(api/ci): allow webhook server some time to settle * fix(api/ci): extract time extension * fix(api/ci): switch index location tests to a more reliable proxy * check crawl errors + extend index cooldown * fix lib	2025-06-20 22:57:23 +02:00
Meet Soni	2082243cb5	feat(scrape): support Google Slides (#1693 ) * feat(scrape): support Google Slides * feat(scrape): add test for scraping Google Slides links	2025-06-20 21:58:45 +02:00
Gergő Móricz	4b03ffca36	fix(search): respect parsePDF in pricing (#1690 )	2025-06-20 21:15:14 +02:00
Gergő Móricz	125e1ada45	feat(scrapeURL): support cookies in safeFetch (#1688 )	2025-06-20 20:43:04 +02:00

1 2 3 4 5 ...

3580 Commits