Gergő Móricz
6e8873762a
feat(apps/test-suite): add Rafa's index benchmark notebook
2025-06-05 23:22:48 +02:00
Gergő Móricz
6d1b9bf1fe
debug(api/scrape): more logging
2025-06-05 22:52:41 +02:00
Gergő Móricz
0c7f864ea4
debug(api/scrape): increased logging to diagnose scrape fluke length
2025-06-05 22:51:25 +02:00
Gergő Móricz
4337992636
feat(sdk): Index parameters + other missing parameters ( #1638 )
2025-06-05 22:22:22 +02:00
Gergő Móricz
1de0ae392c
Index testing improvements (FIR-2214) ( #1637 )
...
* feat(api/tests/scrape): index improvements
* fix(api/test/scrape): add waits to allow batch insert to happen
* fix: ...
2025-06-05 22:10:06 +02:00
Gergő Móricz
78580f65df
feat(webhook): refactor callWebhook and add logWebhook (FIR-2218) ( #1629 )
...
* feat(webhook): refactor callWebhook and add logWebhook
* feat(queue-worker): fix crawl pre-finishing logic (#1628 )
* feat(ci): verify typescript errors
* fix(ci):
* feat(api/tests): add webhook tests + refactor batch scrape lib (#1630 )
* feat(api/tests): add webhook tests + refactor batch scrape lib
* fix(ci):
* feat(webhook/log): insert queue
2025-06-05 22:04:22 +02:00
Gergő Móricz
f050b169e2
feat(api/index): port queryIndexAtSplitLevel to RPC (FIR-2241) ( #1640 )
...
* feat(api/index): port queryIndexAtSplitLevel to RPC
* Update apps/api/src/services/index.ts
2025-06-05 22:02:41 +02:00
Gergő Móricz
a08d52e45d
feat(scrapeURL/index): don't put results by "dumb" engines into the index
2025-06-05 22:01:29 +02:00
Thomas Kosmas
af88218fad
feat: update mu ( #1639 )
...
* update to mu v2
* feat(ci): add RUNPOD_MUV2_POD_ID
* stupid change to make CI run
---------
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-06-05 22:27:00 +03:00
Nicolas
6ca551a887
Merge branch 'main' of https://github.com/mendableai/firecrawl
2025-06-05 15:39:50 -03:00
Nicolas
8c40271796
Update map.ts
2025-06-05 15:39:48 -03:00
Thomas Kosmas
4bf64d2c01
feat(scraper): runpod v2 parallel testing ( #1636 )
...
* feat(scraper): runpod v2 parallel testing
* fix catch
2025-06-05 20:28:01 +03:00
Ademílson Tonato
b2e0f657bd
Merge pull request #1635 from mendableai/refactor/api-integration-parameter
...
feat(api): propagate integration field in queue worker job processing
2025-06-05 17:31:13 +01:00
Ademílson Tonato
6e1f8d6c10
feat(api): propagate integration field in queue worker job processing
2025-06-05 16:39:20 +01:00
Ademílson Tonato
71caf8ae57
Merge pull request #1632 from mendableai/feat/api-integration-parameter
...
feat(api): add integration field to jobs and update related controllers and types
2025-06-05 11:44:02 +01:00
Gergő Móricz
abb8919e2e
feat(js-sdk): changeTrackingOptions.tag
2025-06-04 23:51:53 +02:00
Gergő Móricz
34a18a2d2f
feat(changeTracking): support tags (FIR-1940) ( #1631 )
...
* feat(changeTracking): support tags
* test 408 fixes
2025-06-04 23:50:54 +02:00
Gergő Móricz
6e63528b61
fix(crawl-redis): bad logic
2025-06-04 20:46:04 +02:00
Ademílson Tonato
4c49bb9fc6
refactor: remove unnecessary logs and set integration as default to null
2025-06-04 19:29:47 +01:00
Ademílson Tonato
57a0aed484
feat(api): add integration field to jobs and update related controllers and types
2025-06-04 19:17:57 +01:00
Gergő Móricz
a05c4ae97d
feat(api): GET /crawl/ongoing (FIR-2189) ( #1620 )
...
* feat(api): GET /crawl/ongoing
* fix: routers in wrong order
* feat(api/crawl/ongoing): return more details
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-06-04 18:14:23 +02:00
Gergő Móricz
077c5dd8ec
feat(api/tests): add webhook tests + refactor batch scrape lib ( #1630 )
...
* feat(api/tests): add webhook tests + refactor batch scrape lib
* fix(ci):
2025-06-04 16:11:47 +02:00
Gergő Móricz
122ccd5eb0
fix(ci):
2025-06-04 15:49:45 +02:00
Gergő Móricz
11c1178ca1
feat(ci): verify typescript errors
2025-06-04 15:20:53 +02:00
Gergő Móricz
0f394a10c6
feat(queue-worker): fix crawl pre-finishing logic ( #1628 )
2025-06-04 15:19:48 +02:00
Gergő Móricz
8dd5bf7bd9
feat(api/tests/scrape): Playwright test improvements ( #1626 )
...
* feat(api/tests/scrape): verify that proxy works on Playwright
* debug: logs
* remove logs
* feat(playwright): add contentType relaying
* fix tests
* debug
* fix json
2025-06-04 01:24:19 +02:00
Gergő Móricz
95f204aab7
Index (FIR-2177) ( #1605 )
...
* poc progress
* poc
* url splits and better url normalization
* feat(index): integrate into map
* fix on selfhost
* feat: modifiers
* separate index supa logic
* debug
* fix language comparison
* feat: dontStoreInCache
* feat(index): some rudimentary testing
* feat: use url split columns
* feat(queue-worker/kickoff): use index links to kickoff crawl
* feat(scrapeURL/index): behaviour on non-200 index entries
* feat/added benchmark for scrapes
* feat(map): ignoreIndex
* feat(index): batch insert
* fix(api/tests/scrape): fix index test to work with batching
* disable cacheable lookup for self hosting tests
* feat(js-sdk): dontStoreInCache
* chore(js-sdk): bump
* feat(index): FIRECRAWL_INDEX_WRITE_ONLY
* feat(api/test): index envs
* map benchmarks
* cleanup
* further fixes
* clean up on map
* remove extraneous log
* workflow test run
* asd
* improve fns
* try again
* wow i'm an idiot
* ok fixed
* wth
* revert
* async saving to index
* feat: enhance metadata extraction by including 'itemprop' attribute in HTML (#1624 )
* feat(selfhost): deploy a playwright image (#1625 )
* Testing improvements (FIR-2209) (#1623 )
* yeet ad blocking tests until further notice
* feat: re-enable billing tests
* more timeout
* cache issues with billing test
* weird thing
* fix(api/tests/scrape/status): propagation time
* stupid
* no log
* sws
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com>
v1.10.0
2025-06-03 21:30:19 +02:00
Gergő Móricz
406d696667
Testing improvements (FIR-2209) ( #1623 )
...
* yeet ad blocking tests until further notice
* feat: re-enable billing tests
* more timeout
* cache issues with billing test
* weird thing
* fix(api/tests/scrape/status): propagation time
* stupid
* no log
* sws
2025-06-03 21:16:36 +02:00
Gergő Móricz
e297cf8a0d
feat(selfhost): deploy a playwright image ( #1625 )
2025-06-03 19:19:08 +02:00
Ademílson Tonato
41897139da
feat: enhance metadata extraction by including 'itemprop' attribute in HTML ( #1624 )
2025-06-03 18:16:46 +02:00
Nicolas
e108ff3525
Update search.ts
2025-06-02 23:46:55 -03:00
Nicolas
9347de6a41
Update scrape.ts
2025-06-02 23:15:59 -03:00
Nicolas
86a9d3525b
Update queue-jobs.ts
2025-06-02 23:09:09 -03:00
Nicolas
cbc47305cc
Update search.ts
2025-06-02 23:09:02 -03:00
Nicolas
ce425d966f
Merge branch 'nsc/bypass-billing-internal'
2025-06-02 22:37:56 -03:00
Nicolas
8c661f5329
Update scrape.ts
2025-06-02 22:37:49 -03:00
Nicolas
dc8cc99b1d
Nick: bypass billing ( #1622 )
2025-06-02 21:57:28 -03:00
Nicolas
8967b31465
Nick: bypass billing
2025-06-02 21:51:46 -03:00
Nicolas
bf919ceb82
Nick: __searchPreviewToken
2025-06-02 21:16:34 -03:00
Nicolas
ef789ce8d7
Nick: __experimental
2025-06-02 19:58:56 -03:00
Gergő Móricz
72be73473f
feat(api/scrape): credits_billed column + handle billing for /scrape calls on worker side with stricter timeout enforcement (FIR-2162) ( #1607 )
...
* feat(api/scrape): stricten timeout and handle billing and logging on queue-worker
* fix: abortsignal pre-check
* fix: proper level
* add comment to clarify is_scrape
* reenable billing tests
* Revert "reenable billing tests"
This reverts commit 98236fdfa03dde8cecdd6b763fcf86810e468a28.
* oof
* fix searxng logging
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-06-02 17:56:27 -03:00
Gergő Móricz
4167ec53eb
fix(scrapeURL): only allow disabling the adblock on playwright (FIR-2200) ( #1616 )
...
* fix(scrapeURL): only allow disabling the adblock on playwright
* feat(api/tests/scrape): re-enable ad blocking tests
2025-06-02 22:48:16 +02:00
Gergő Móricz
7a8be13220
remove indexes that are no longer used
2025-06-02 22:09:55 +02:00
Gergő Móricz
98ceda9bd5
feat(search): ignore concurrency limit for search (FIR-2187) ( #1617 )
...
* feat(search): ignore concurrency limit for search (temp)
* feat(search): only for low tier users for good DX
2025-06-02 17:07:44 -03:00
Gergő Móricz
1396451d31
bump rust version pt.2
2025-06-02 18:10:14 +02:00
Gergő Móricz
07fb651a91
bump rust version
2025-06-02 18:09:12 +02:00
Supasin Liulak
6a76ccfacb
webhook param for crawl ( #1609 )
2025-06-02 18:08:32 +02:00
Nicolas
9297afd1ff
Nick: search
2025-05-29 17:00:13 -03:00
Gergő Móricz
a8e0482718
feat(search): bill for PDFs properly
2025-05-29 20:59:15 +02:00
Gergő Móricz
a2f41fb650
feat(api/server): wait 60s for GCE load balancer drain timeout
...
To minimize 502s.
2025-05-29 20:08:52 +02:00