67 Commits

Author SHA1 Message Date
Nicolas
2c391b0105 Nick: 2025-01-24 18:09:25 -03:00
Nicolas
d547192f37 Nick: fixed spread schemas 2025-01-24 17:55:16 -03:00
rafaelmmiller
3184e91f66 layers 2025-01-24 10:25:45 -03:00
rafaelmmiller
64d116540f rerank with lower threshold + back to map if lenght = 0 2025-01-24 09:08:16 -03:00
Móricz Gergő
05d79a875a fix(extract): oops 2025-01-24 11:55:41 +01:00
Móricz Gergő
4db9a4a675 fix(extraction-service): allow no multiEntityKeys if isMultiEntity is false 2025-01-24 11:33:49 +01:00
rafaelmmiller
f1cd891a70 added today to extract prompts 2025-01-23 17:14:45 -03:00
Gergő Móricz
6f696d32ae feat(extract): add log on 0 links 2025-01-23 19:25:12 +01:00
Gergő Móricz
5d56627bfa feat(extraction-service): highlight req schema generation 2025-01-23 19:24:24 +01:00
Móricz Gergő
9da51a7514 feat(extract): add original schema to logs 2025-01-23 14:59:54 +01:00
Móricz Gergő
561f0186ef fix build error 2025-01-23 12:07:37 +01:00
Móricz Gergő
d3518e85a8 feat(extract): add logging 2025-01-23 12:05:15 +01:00
Nicolas
ccb74a2b43 Nick: increased timeouts on extract + reduced extract redis usage 2025-01-23 01:28:26 -03:00
Nicolas
498558d358 Nick: formatting done 2025-01-22 18:47:44 -03:00
Nicolas
994e1eb502 Nick: rm logs 2025-01-22 17:27:48 -03:00
Nicolas
56f048aeff Reapply "Nick:"
This reverts commit 4b4385c520c7223cf79ebba981dded8ffaefde11.
2025-01-22 17:26:32 -03:00
Nicolas
4b4385c520 Revert "Nick:"
This reverts commit 6718ce89085339eaaceb1e88a0aa45ecff3216ac.
2025-01-22 17:26:09 -03:00
Nicolas
e1ef826ac6 Merge branch 'main' of https://github.com/mendableai/firecrawl 2025-01-22 17:25:49 -03:00
Nicolas
6718ce8908 Nick: 2025-01-22 17:25:48 -03:00
Gergő Móricz
208bd4ca0c fix(extraction-service): marginally improve logging 2025-01-22 19:38:09 +01:00
Nicolas
2b9f63cf10 Nick: more permissive re-ranker 2025-01-21 11:30:54 -03:00
Nicolas
5030fea634 Update document-scraper.ts 2025-01-20 13:28:59 -03:00
Nicolas
d786949639 Reapply "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
This reverts commit 8b17af40018688c34f95727ceaec289b02ab2023.
2025-01-19 22:04:12 -03:00
Nicolas
8b17af4001 Revert "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
This reverts commit 406f28c04aff2ba3ae65f483627da13f02943cc3, reversing
changes made to 34ad9ec25d73f37deb1e3adec2315a121ec52f0e.
2025-01-19 22:00:28 -03:00
Nicolas
64607f3f20 Update extraction-service.ts 2025-01-18 22:40:53 -03:00
Nicolas
b8a30a50e2 Update llm-cost.ts 2025-01-18 21:25:25 -03:00
Nicolas
9cd48d7f73 Nick: 2025-01-17 23:47:22 -03:00
Nicolas
1f6abf95e8 Nick: extract billing works 2025-01-17 20:59:44 -03:00
Nicolas
ca14c651da Update model-prices.ts 2025-01-15 21:07:53 -03:00
Nicolas
4db023280d Nick: introduce llm-usage cost analysis 2025-01-15 21:01:29 -03:00
Nicolas
957eea4113 Nick: extract without a schema should work as expected 2025-01-14 11:37:00 -03:00
Nicolas
61e6af2b16 Nick: streaming callback experimental 2025-01-14 02:13:42 -03:00
Nicolas
c323c64671 Update extract-redis.ts 2025-01-14 02:00:47 -03:00
Nicolas
2dc87a2e1c Update extraction-service.ts 2025-01-14 01:59:52 -03:00
Nicolas
033e9bbf29 Nick: __experimental_streamSteps 2025-01-14 01:45:50 -03:00
Nicolas
5e5b5ee0e2
(feat/extract) New re-ranker + multi entity extraction (#1061)
* agent that decides if splits schema or not

* split and merge properties done

* wip

* wip

* changes

* ch

* array merge working!

* comment

* wip

* dereferentiate schema

* dereference schemas

* Nick: new re-ranker

* Create llm-links.txt

* Nick: format

* Update extraction-service.ts

* wip: cooking schema mix and spread functions

* wip

* wip getting there!!!

* nick:

* moved functions to helpers

* nick:

* cant reproduce the error anymore

* error handling all scrapes failed

* fix

* Nick: added the sitemap index

* Update sitemap-index.ts

* Update map.ts

* deduplicate and merge arrays

* added error handler for object transformations

* Update url-processor.ts

* Nick:

* Nick: fixes

* Nick: big improvements to rerank of multi-entity

* Nick: working

* Update reranker.ts

* fixed transformations for nested objs

* fix merge nulls

* Nick: fixed error piping

* Update queue-worker.ts

* Update extraction-service.ts

* Nick: format

* Update queue-worker.ts

* Update pnpm-lock.yaml

* Update queue-worker.ts

---------

Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Thomas Kosmas <thomas510111@gmail.com>
2025-01-13 22:30:15 -03:00
Nicolas
9a13c1dede Nick: fixes to extract rephrase prompt 2025-01-11 20:22:36 -03:00
Nicolas
f4d10c5031 Nick: formatting fixes 2025-01-10 18:35:10 -03:00
Nicolas
aa31508ccd Nick: links-billed update (temp) 2025-01-08 15:13:33 -03:00
Nicolas
b98e289f03 Nick: 2025-01-07 17:49:21 -03:00
Nicolas
51636352a6 Merge branch 'nsc/extract-queue' of https://github.com/mendableai/firecrawl into nsc/extract-queue 2025-01-07 16:21:58 -03:00
Nicolas
11af214db1 Nick: update extract in case there is an error 2025-01-07 16:21:51 -03:00
Gergő Móricz
1f2a76fc23
Update apps/api/src/lib/extract/extraction-service.ts 2025-01-07 20:18:10 +01:00
Nicolas
eb254547e5 Nick: 2025-01-07 16:16:01 -03:00
Nicolas
bb27594443 Merge branch 'main' into nsc/extract-queue 2025-01-06 13:01:15 -03:00
Nicolas
499479c85e Update url-processor.ts 2025-01-03 21:28:52 -03:00
Nicolas
6b2e1cbb28 Nick: cache /extract scrapes 2025-01-03 21:19:40 -03:00
Nicolas
27457ed5db Nick: init 2025-01-03 20:44:27 -03:00
rafaelmmiller
ef0fc8d0d3 broader search if didnt find results 2025-01-02 18:00:18 -03:00
Nicolas
c3fd13a82b Nick: fixed re-ranker and enabled url cache of 2hrs 2024-12-31 18:06:07 -03:00