1997 Commits

Author SHA1 Message Date
Gergő Móricz
9ad947884d
feat(tests/snips): add billing tests + misc billing fixes (FIR-1280) (#1283)
* feat(tests/snips): add billing tests + misc billing fixes

* add testing key

* asd
2025-03-02 16:51:42 -03:00
Gergő Móricz
4f25f12a12
fix(ai): handle if AI returns a JSON code block (#1280) 2025-03-02 15:25:24 -03:00
Gergő Móricz
e8c698d613
feat(crawler): handle cross-origin redirects differently than same-origin redirects (#1279) 2025-03-02 13:32:46 +01:00
Nicolas
fea249c568 Update auth.ts 2025-03-02 02:40:34 -03:00
Gergő Móricz
904e69bfbc
feat(supabase): add read replica routing (#1274) 2025-02-28 09:52:26 +01:00
Gergő Móricz
44bf59229a fix(acuc): cache for 1 hour 2025-02-27 21:36:33 +01:00
Nicolas
b72e21a697
Nick: batch billing (#1264) 2025-02-27 20:18:03 +01:00
Nicolas
289e351c14
(feat/deep-research-alpha) Added Max Urls, Sources and Fixes (#1271)
* Nick: fixes

* Nick:

* Update deep-research-status.ts
2025-02-27 13:24:40 -03:00
Gergő Móricz
1d3757b391 bump map to 30k 2025-02-27 12:44:23 +01:00
Grass Huang
7bf04d409a
fix(scraper): improve charset detection regex to accurately parse meta tags (#1265) 2025-02-26 17:31:06 +01:00
Nicolas
31df234127 Update log_job.ts 2025-02-25 21:01:05 -03:00
Nicolas
ec90aaffd6 Update log_job.ts 2025-02-25 21:01:00 -03:00
Nicolas
59d09f5c45 Update log_job.ts 2025-02-25 19:32:16 -03:00
Gergő Móricz
115b6b61c4 add initial codeowners 2025-02-25 14:28:09 +01:00
Gergő Móricz
8c42b08b7e
feat(v1/crawl-status-ws): update behavior to ignore errors like regular crawl-status (#1234) 2025-02-24 21:44:29 +01:00
Gergő Móricz
15489be542
feat(self-host/ai): use any OpenAI-compatible API (#1245) 2025-02-23 09:07:32 +01:00
Nicolas
b24ac0f6b5
Nick: done (#1237) 2025-02-22 20:18:46 -03:00
Nicolas
5ab86b8b43
(fix/token-slicer) Fixes extract token limit issues (#1236)
* Nick: fixes extract token limit errors

* Update llmExtract.ts

* Update llmExtract.ts
2025-02-21 20:44:42 +01:00
Gergő Móricz
76e1f29ae8
Update Dockerfile (#1231) (#1232)
* Update Dockerfile (#1231)

* Dockerfile: re-add prod-deps stage and fix copies

---------

Co-authored-by: Loris <loris.rion@gmail.com>
2025-02-21 17:44:01 +01:00
Nicolas
6c51ef401e Update rate-limiter.ts 2025-02-20 22:31:43 -03:00
Nicolas
25d9bdb1f6
(feat/ai-sdk) Migrate to AI-SDK (#1220)
* Nick: init

* Update llmExtract.ts

* Update llmExtract.ts

* Nick rename

* fix(v1/types): extract json schema validation

* Update url-processor.ts

* feat(ai-sdk): ollama support

* feat(ai-sdk): further ollama support

* Nick: it is broken btw

* feat(ai-sdk): abstract model adapter

* Update pnpm-lock.yaml

* Update analyzeSchemaAndPrompt.ts

* Nick:

* feat(ai-sdk): ollama support

* doc(SELF_HOST): update with embedding param

* Nick:

* Update ranker.ts

* Nick:

* feat(ai-sdk): fixes

* Update llmExtract.ts

* feat: remove zod-to-json-schema

* fix

* Update llmExtract.ts

* use openai

* fixes

---------

Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-02-20 22:48:58 +01:00
Gergő Móricz
16c305775e
fix(crawl-redis): ignore empty includes/excludes (#1223)
* fix(crawl-redis): ignore empty includes/excludes

* fix(snips/scrape): bump timeouts
2025-02-20 19:06:02 +01:00
Gergő Móricz
283a3bfef3
fix(scrapeURL/engines/fetch): discover charset and re-decode (#1221)
* fix(scrapeURL/engines/fetch): discover charset and re-decode

* fix(snips/scrape): allow more time for stealth proxy
2025-02-20 18:56:15 +01:00
Gergő Móricz
e417f83c28
feat(self-host): ollama support (#1219) 2025-02-20 16:59:19 +01:00
Gergő Móricz
e84c7325d9 chore: remove dead code
Fixes #1149
Fixes #1150
2025-02-20 15:44:34 +01:00
Nicolas
2151ca846c Merge branch 'main' of https://github.com/mendableai/firecrawl 2025-02-20 10:50:32 -03:00
Nicolas
7db2d25efa Nick: 2025-02-20 10:50:22 -03:00
Gergő Móricz
c38dcd0432
feat(self-host): proxy support (FIR-1111) (#1212)
* feat(self-host): proxy support

* fix(playwright-service-ts): return untreated text/plain
2025-02-20 14:20:03 +01:00
Loris
100168ddf3
Add searxng for search endpoint (#1193)
* add searxng.ts

* update to add searxng endpoint

* Apply suggestions from code review

* feat(ci/self-host): add tests with searxng

* feat(ci/self-host): bootstrap searxng for testing

* feat(ci): improvements in syntax

---------

Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-02-20 12:36:53 +01:00
Gergő Móricz
da1670b78c
feat(map): mock support (FIR-1109) (#1213)
* feat(map,fetch): mock support

* feat(snips/map): mock out long-running test

* fix(snips/scrape): use more reliable site for adblock testing
2025-02-20 10:41:43 +01:00
Gergő Móricz
11ed679274 feat(scrapeURL/pdf): support PDF prefetch when parsePDF is off 2025-02-20 09:28:13 +01:00
Gergő Móricz
5eb0235ccb feat(apps/api): remove Sentry builds 2025-02-20 08:19:06 +01:00
Gergő Móricz
55d047b6b3
feat(scrapeURL): handle PDFs behind anti-bot (#1198) 2025-02-20 04:11:30 +01:00
Gergő Móricz
c39cc27866
feat(ci/self-host): add playwright microservice tests (#1210)
* feat(ci/self-host): add playwright microservice tests

* fix ci

* fix ci 2

* fix ci 3

* fix(playwright-service): get raw JSON if response is JSON
2025-02-20 02:06:13 +01:00
Gergő Móricz
46b187bc64
feat(v1/map): stop mapping if timed out via AbortController (#1205) 2025-02-20 00:42:13 +01:00
Gergő Móricz
2200f084f3
SELFHOST FIXES (#1207)
* fix(extract): construct OpenAI on demand

Fixes hard-crash if api key not specified in a self-hosting environment.

* fix(ci): try sleeping

* fix(ci): override host

* fix(ci): wait for server to start

* Support /extract and /crawl for self-hosted (FIR-1097) (#1137)

* Support /extract for self-hosted

This returns the job response from redis rather than supabase when db auth is disabled (self hosted mode)

* Use getJob for extract and use correct types

* fix(v1/crawl-status): only poll DB for total count if DB is enabled

* feat(snips): TEST_SUITE_SELF_HOSTED

* fix(ci/test-server-self-host): use pr trigger

* fix(scrapeURL): f-e mocking in selfhosted env

* fix(snips): do not try to eval json format on selfhost

* fix(scrapeURL): further f-e mocking

* fix(snips): don't timeout on hard fail polling

* fix(v1/extract-status): fix-up the db-agnostic impl

unfortunately had to separate the functions since the schema
was too divergent :(

* fix(snips): boost screenshot delay

* feat(ci): test with openai

* feat(ci): extract, search testing

* fix(ci): matrix

* fix(ci): bleh

* Update: fix default google search (#1174)

* fix log title

* search should always work

* asd

* fix ci

---------

Co-authored-by: Nick Roth <nlr06886@gmail.com>
Co-authored-by: William <sdustusun@gmail.com>
2025-02-20 00:41:22 +01:00
Gergő Móricz
055f7d2da0
fix(concurrency-limit): move to renewing a lock on each job instead of estimating time to complete (#1197) 2025-02-19 20:13:22 +01:00
Nicolas
acf1e60608 Nick: llmstxt improvements 2025-02-19 16:09:46 -03:00
Nicolas
d4cf2269ed Update generate-llmstxt-service.ts 2025-02-19 15:50:59 -03:00
Nicolas
f5de803a9d Nick: fixes 2025-02-19 15:21:52 -03:00
Nicolas
a60f3ff645 Nick: fixes 2025-02-19 15:01:47 -03:00
Eric Ciarla
d984b50400
Add llmstxt generator endpoint (#1201)
* Nick:

* Revert "fix(v1/types): fix extract -> json rename (FIR-1072) (#1195)"

This reverts commit 586a10f40d354a038afc2b67809f20a7a829f8cb.

* Update deep-research-service.ts

* Nick:

* init

* part 2

* Update generate-llmstxt-service.ts

* Fix queue

* Update queue-worker.ts

* Almost there

* Final touches

* Update requests.http

* final touches

* Update requests.http

* Improve logging

* Change endpoint to /llmstxt

* Update queue-worker.ts

* Update generate-llmstxt-service.ts

* Nick: cache

* Update index.ts

* Update firecrawl.py

* Update package.json

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-02-19 14:42:33 -03:00
Nicolas
5c47e97db2
(feat/deep-research) Alpha implementation of deep research (#1202)
* Nick:

* Revert "fix(v1/types): fix extract -> json rename (FIR-1072) (#1195)"

This reverts commit 586a10f40d354a038afc2b67809f20a7a829f8cb.

* Update deep-research-service.ts

* Nick:

* Nick:

* Nick:

* Nick:

* Nick:

* Nick:

* Update deep-research-service.ts

* Nick:

* Update deep-research-service.ts

* Apply suggestions from code review

---------

Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-02-19 12:44:21 -03:00
Gergő Móricz
fc64f436ed
fix(v1/types): fix extract -> json rename, ROUND II (FIR-1072) (#1199)
* Revert "Revert "fix(v1/types): fix extract -> json rename (FIR-1072) (#1195)""

This reverts commit e28a44463ae49ffc195507204492cc7c15c438c4.

* fix(v1/types): fix bad transform

* feat(v1): proxy option / stealthProxy flag (FIR-1050) (#1196)

* feat(v1): proxy option / stealthProxy flag

* feat(js-sdk): add proxy option

* fix

* fix extract tests
2025-02-19 12:07:55 -03:00
Gergő Móricz
42050d3d6e fix 2025-02-18 18:06:39 +01:00
Gergő Móricz
b136e42b53
feat(v1): proxy option / stealthProxy flag (FIR-1050) (#1196)
* feat(v1): proxy option / stealthProxy flag

* feat(js-sdk): add proxy option
2025-02-18 18:03:10 +01:00
Nicolas
e28a44463a Revert "fix(v1/types): fix extract -> json rename (FIR-1072) (#1195)"
This reverts commit 586a10f40d354a038afc2b67809f20a7a829f8cb.
2025-02-18 11:31:23 -03:00
Gergő Móricz
586a10f40d
fix(v1/types): fix extract -> json rename (FIR-1072) (#1195)
* fix(v1/types): fix extract -> json rename

* fix(types/v1): bad transform
2025-02-18 10:32:19 -03:00
Tetsuro Yokoyama
5ac6eb7440
Update self-hosted Kubernetes deployments examples for compatibility and consistency (#1177)
* fix: Quote variables in `docker-entrypoint.sh`

- This commit adds double quotes around variables in the docker-entrypoint.sh script to prevent word splitting and globbing issues, ensuring the script behaves correctly in all cases.

* fix: Ensure worker/api deployment starts with `OPENAI_API_KEY`

* fix: Add missing `FLY_PROCESS_GROUP` env var to deployments

* fix: Correct `PLAYWRIGHT_MICROSERVICE_URL` in `firecrawl-config`

* fix: Update Docker build options for Apple Silicon compatibility

* fix: Correct `PLAYWRIGHT_MICROSERVICE_URL` in `firecrawl-config`
2025-02-18 13:33:20 +01:00
Gergő Móricz
aacbea1d9e fix(tests/snips/map): remove flaky useless test 2025-02-18 13:20:11 +01:00