firecrawl

mirror of https://github.com/mendableai/firecrawl.git synced 2025-06-27 00:41:33 +00:00

Author	SHA1	Message	Date
Micah Stairs	9a5d40c3cf	Allow international URLs to pass validation (#1717 )	2025-06-26 13:16:42 -04:00
devin-ai-integration[bot]	1919799bed	feat(python-sdk): add parsePDF parameter support (#1713 ) * feat(python-sdk): add parsePDF parameter support - Add parsePDF field to ScrapeOptions class for Search API usage - Add parse_pdf parameter to both sync and async scrape_url methods - Add parameter handling logic to pass parsePDF to API requests - Add comprehensive tests for parsePDF functionality - Maintain backward compatibility with existing API The parsePDF parameter controls PDF processing behavior: - When true (default): PDF content extracted and converted to markdown - When false: PDF returned in base64 encoding with flat credit rate Resolves missing parsePDF support in Python SDK v2.9.0 Co-Authored-By: Micah Stairs <micah@sideguide.dev> * Update __init__.py * Update test.py * Update __init__.py --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Micah Stairs <micah@sideguide.dev> Co-authored-by: Nicolas <nicolascamara29@gmail.com>	2025-06-26 16:34:43 +00:00
devin-ai-integration[bot]	89e57ace3c	Add temporary exception for Faire team ID to bypass job expiration (#1716 ) * Add temporary exception for Faire team ID to bypass job expiration - Add TEMP_FAIRE_TEAM_ID constant for team f96ad1a4-8102-4b35-9904-36fd517d3616 - Modify job expiration logic to skip 24-hour timeout for this team - Add tests to verify Faire team bypasses expiration and others don't - Temporary solution to allow Faire team access to expired crawl jobs Co-Authored-By: Micah Stairs <micah@sideguide.dev> * Update apps/api/src/__tests__/snips/crawl.test.ts --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Micah Stairs <micah@sideguide.dev> Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>	2025-06-26 13:42:34 +00:00
Gergő Móricz	f4714f4849	fix(js-sdk/extract): use same zod fallback logic (#1711 )	2025-06-25 17:59:58 +00:00
Gergő Móricz	3d04c2087e	fix(api): cached acuc didn't have the is_extract flag set (#1712 ) cosmetic issue only (error message), no behavioural change	2025-06-25 16:43:53 +02:00
Gergő Móricz	bc9065810d	fix(concurrency-limit): overlogging (#1709 )	2025-06-24 17:01:32 +00:00
Gergő Móricz	cc3afa2578	fix(concurrency-limit): scan instead of taking jobs (#1708 )	2025-06-24 13:32:22 -03:00
Gergő Móricz	ae94edd43e	feat(api/ci): idmux (#1707 ) * feat(api/ci): idmux * fix: bad merge * no more default identity * fix change tracking test * fix httpstatus going down lol * fix change tracking tests * bump timeout * fix ct self-hosted * further fixes * one more httpstatus bug * bs * it's being weird, blockAds testing	2025-06-24 15:36:05 +02:00
Gergő Móricz	86603de664	fix(api): instantiate Storage only once (#1706 )	2025-06-24 00:18:07 +02:00
Gergő Móricz	11f469488e	fix(api/batch/scrape): maxConcurrency field support when using ignoreInvalidURLs (#1705 ) * fix(api/batch/scrape): maxConcurrency field support when using ignoreInvalidURLs * fix(tests): timeouts	2025-06-23 21:44:55 +02:00
Gergő Móricz	e7a62dd490	fix(api): pdf bug + testing bugs (#1704 )	2025-06-23 19:57:27 +02:00
Gergő Móricz	fe9057559b	fix(v1): check credits variable scope collision (#1703 ) This is what’s been causing the weird insufficient credits errors.	2025-06-23 19:19:01 +02:00
Gergő Móricz	e3948ae5b1	feat(api): pdf action + housekeeping (#1702 ) * feat(api): pdf action + housekeeping * fix TS build	2025-06-23 19:03:35 +02:00
Ademílson Tonato	78a3579d6e	feat: add relevanceai as part of the integrations	2025-06-23 16:41:19 +01:00
Gergő Móricz	439619ffc6	fix(api/v1/crawl/ongoing): only crawls, no batch scrape (#1701 )	2025-06-23 16:33:02 +02:00
Gergő Móricz	1fdf95913d	feat(api): optimize job count query and improve error handling (#1700 )	2025-06-23 16:18:55 +02:00
Gergő Móricz	c31172493e	fix(api): handle errors better in redis-less crawl status (#1699 )	2025-06-23 15:56:20 +02:00
Gergő Móricz	66cde50a2a	fix(api): enhance error handler with optional ACUC data (#1698 ) Update error handler to use RequestWithMaybeACUC type, allowing access to optional ACUC properties on the request object. Include team_id from ACUC in error logging to improve context for debugging.	2025-06-23 15:42:38 +02:00
Gergő Móricz	e06ec2d047	fix(api): improve error logging with structured error object (#1697 )	2025-06-23 15:26:15 +02:00
Gergő Móricz	7ed19c0ac0	feat(scrapeURL): separate URL rewrites to different function	2025-06-21 02:02:29 +02:00
Gergő Móricz	9174e0c8a0	fix(api): CI (#1692 ) * add scrapeTimeout parameter * fix(api/ci): allow webhook server some time to settle * fix(api/ci): extract time extension * fix(api/ci): switch index location tests to a more reliable proxy * check crawl errors + extend index cooldown * fix lib	2025-06-20 22:57:23 +02:00
Meet Soni	2082243cb5	feat(scrape): support Google Slides (#1693 ) * feat(scrape): support Google Slides * feat(scrape): add test for scraping Google Slides links	2025-06-20 21:58:45 +02:00
Gergő Móricz	4b03ffca36	fix(search): respect parsePDF in pricing (#1690 )	2025-06-20 21:15:14 +02:00
Gergő Móricz	125e1ada45	feat(scrapeURL): support cookies in safeFetch (#1688 )	2025-06-20 20:43:04 +02:00
Gergő Móricz	3f0b8b8e27	Remove old cache mechanisms (redis cache, PDF cache, crawl maps, etc.) (FIR-2266) (#1667 ) * feat(api): remove old indexes pt. 1 * feat(map): better subdomain support * more culling * adjust map maxage * feat(api/tests): add tests for pdf caching * fix(scrapeURL/index): pdf caching * restore pdf cache * fix __experimental_cache * sitemap fetching * remove extra var	2025-06-20 19:40:28 +02:00
Nicolas	363afb8048	Nick: updated openapi specs	2025-06-20 14:30:37 -03:00
Nicolas	80f7177473	Nick: bump version v1.12.0	2025-06-20 12:05:15 -03:00
devin-ai-integration[bot]	09aabbedb5	feat: add followInternalLinks parameter as semantic replacement for allowBackwardLinks (#1684 ) * feat: add followInternalLinks parameter as semantic replacement for allowBackwardLinks - Add followInternalLinks parameter to crawl API with same functionality as allowBackwardLinks - Update transformation logic to use followInternalLinks with precedence over allowBackwardLinks - Add parameter to Python SDK crawl methods with proper precedence handling - Add parameter to Node.js SDK CrawlParams interface - Add comprehensive tests for new parameter and backward compatibility - Maintain full backward compatibility for existing allowBackwardLinks usage - Add deprecation notices in documentation while preserving functionality Co-Authored-By: Nick <nicolascamara29@gmail.com> * fix: revert accidental cache=True changes to preserve original cache parameter handling - Revert cache=True back to cache=cache in generate_llms_text methods - Preserve original parameter passing behavior for cache parameter - Fix accidental hardcoding of cache parameter to True Co-Authored-By: Nick <nicolascamara29@gmail.com> * refactor: rename followInternalLinks to crawlEntireDomain across API, SDKs, and tests - Rename followInternalLinks parameter to crawlEntireDomain in API schema - Update Node.js SDK CrawlParams interface to use crawlEntireDomain - Update Python SDK methods to use crawl_entire_domain parameter - Update test cases to use new crawlEntireDomain parameter name - Maintain backward compatibility with allowBackwardLinks - Update transformation logic to use crawlEntireDomain with precedence Co-Authored-By: Nick <nicolascamara29@gmail.com> * fix: add missing cache parameter to generate_llms_text and update documentation references Co-Authored-By: Nick <nicolascamara29@gmail.com> * Update apps/python-sdk/firecrawl/firecrawl.py --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Nick <nicolascamara29@gmail.com> Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>	2025-06-20 12:02:23 -03:00
Gergő Móricz	f939428264	feat(scrape): support Google Docs (FIR-1365) (#1686 ) * feat(scrape): support Google Docs * fixes	2025-06-20 11:42:41 +02:00
Gergő Móricz	f8983fffb7	Concurrency limit refactor + `maxConcurrency` parameter (FIR-2191) (#1643 )	2025-06-20 10:45:36 +02:00
Gergő Móricz	a8e3c29664	feat(scrape, extract): creditsUsed, tokensUsed fields (FIR-2336) (#1683 ) * fix(scrape): log FIRE-1 credits billed on failures properly * fix dumb thinbgs * feat(scrape, extract): creditsUsed fields * fix(extract): call it tokensUsed * Trigger Build * dumb mistake, search does separate billing	2025-06-18 21:49:20 +02:00
Gergő Móricz	fbd81b4168	fix(scrape): log FIRE-1 credits billed on failures properly (FIR-2331) (#1682 ) * fix(scrape): log FIRE-1 credits billed on failures properly * fix dumb thinbgs	2025-06-18 21:47:58 +02:00
Gergő Móricz	ebc1de9d60	feat(crawl-status): refactor to work after a redis flush (#1664 )	2025-06-18 18:58:04 +02:00
devin-ai-integration[bot]	cd2e0f868c	Add deployment type field to bug report template (#1681 ) - Add 'Deployment Type' field to Environment section - Allows users to specify Cloud (firecrawl.dev) vs Self-hosted - Helps maintainers better triage issues based on deployment context - Positioned logically after OS field in existing template structure Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Nick <nicolascamara29@gmail.com>	2025-06-18 12:26:15 -03:00
Thomas Kosmas	199115c7be	stop testing new mu	2025-06-18 00:48:50 +03:00
Thomas Kosmas	f46f845efc	fix: send the request to new mu version before the main one to achieve better sync	2025-06-17 20:37:45 +03:00
Thomas Kosmas	ee7b29b3f6	feat: Test mu v3 (#1678 ) * Test mu v3 * fix env	2025-06-17 20:13:19 +03:00
Gergő Móricz	5ca8e2e98e	feat(index): store short titles and descriptions (#1677 )	2025-06-17 19:09:07 +02:00
devin-ai-integration[bot]	9710bdffc0	Improve URL filtering error messages with specific denial reasons (FIR-2352) (#1676 ) * Improve URL filtering error messages with specific denial reasons - Add FilterResult and FilterLinksResult interfaces for structured error reporting - Define DenialReason enum with specific, human-readable error messages - Update filterURL method to return structured results with denial reasons - Update filterLinks method to collect and return denial reasons for each URL - Modify error handling in queue-worker.ts to use specific denial reasons - Add comprehensive tests for different URL filtering scenarios - Maintain backward compatibility while improving error specificity Fixes: Misleading 'includePaths/excludePaths rules' error now shows actual denial reason (robots.txt, exclude patterns, depth limits, etc.) Co-Authored-By: mogery@sideguide.dev <mogery@sideguide.dev> * Fix test compilation error for FilterLinksResult interface - Update crawler.test.ts to use filteredLinks.links.length instead of filteredLinks.length - Update test expectations to use filteredLinks.links array - Resolves TypeScript compilation error preventing CI from passing Co-Authored-By: mogery@sideguide.dev <mogery@sideguide.dev> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: mogery@sideguide.dev <mogery@sideguide.dev>	2025-06-17 19:00:29 +02:00
Nicolas	c6482eaf2d	Nick: prevent additional logging on /extract scrapes	2025-06-13 18:17:17 -03:00
Gergő Móricz	ea321b4936	fix search test timeouts	2025-06-13 17:42:55 +02:00
Thomas Kosmas	38c5795282	feat(vertex): fix vertex ai provider bug and update model references to use "gemini-2.5-pro" (#1668 )	2025-06-13 18:29:03 +03:00
Gergő Móricz	0bf23071ff	feat(index): add domain splitting for improved map querying (#1666 ) v1.11.0	2025-06-13 15:22:45 +02:00
Gergő Móricz	07224b8cd4	feat: use index in search and extract (#1660 )	2025-06-13 12:30:28 +02:00
Gergő Móricz	f296342731	feat(index): remove unused columns (#1662 )	2025-06-12 16:51:40 +02:00
Gergő Móricz	89e42b1137	fix(api): remove query parameter sanitization that was breaking extracts (#1661 )	2025-06-12 15:37:45 +02:00
Gergő Móricz	3c03d07051	feat: add credits_billed everywhere (FIR-2286) (#1655 ) * feat: add credits_billed everywhere also a bit of logging improvement for logJob * fix(queue-worker): db auth check before doing rpc for crawl/batch_scrape	2025-06-11 23:06:55 +02:00
Nicolas	bf3b2a359a	Improve concurrency limit email notifications (#1658 ) * Update email_notification.ts * Update email_notification.ts * Update email_notification.ts	2025-06-11 17:14:54 -03:00
Pulkit Saini	255be2a2ff	Fix PLAYWRIGHT_MICROSERVICE_URL env var to use /scrape endpoint (#1654 ) The correct environment variable should be PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape instead of PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/html	2025-06-11 16:53:32 +02:00
Gergő Móricz	19dd086eb3	improve auto recharge logging	2025-06-11 16:26:06 +02:00

1 2 3 4 5 ...

3554 Commits