Gergő Móricz
|
5a039e7b64
|
fix(v1/map): add wrapper around tryGetSitemap
|
2025-01-22 19:00:46 +01:00 |
|
Nicolas
|
5aad21b35a
|
Update extract.ts
|
2025-01-22 11:01:10 -03:00 |
|
Nicolas
|
3604f2a3ae
|
Nick: misc improvements
|
2025-01-21 16:57:45 -03:00 |
|
Nicolas
|
ac0d10c451
|
Nick: sitemap fetch only below threshold for /map
|
2025-01-21 16:28:57 -03:00 |
|
Gergő Móricz
|
dcbe0b319c
|
fix(v1/crawl-status-ws): wait to send catchup before closing
|
2025-01-20 20:01:27 +01:00 |
|
Nicolas
|
ef69b1ac88
|
Nick: allowExternalLinks is now enableWebSearch
|
2025-01-20 13:41:30 -03:00 |
|
Móricz Gergő
|
2cf7a4f57a
|
fix(batch-scrape): auto finish "kickoff" (no kickoff)
|
2025-01-20 09:40:59 +01:00 |
|
Nicolas
|
d786949639
|
Reapply "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
This reverts commit 8b17af40018688c34f95727ceaec289b02ab2023.
|
2025-01-19 22:04:12 -03:00 |
|
Nicolas
|
8b17af4001
|
Revert "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
This reverts commit 406f28c04aff2ba3ae65f483627da13f02943cc3, reversing
changes made to 34ad9ec25d73f37deb1e3adec2315a121ec52f0e.
|
2025-01-19 22:00:28 -03:00 |
|
Nicolas
|
406f28c04a
|
Merge pull request #1068 from mendableai/nsc/llm-usage-extract
(feat/extract) - LLMs usage analysis + billing
|
2025-01-19 21:36:33 -03:00 |
|
Nicolas
|
34ad9ec25d
|
Merge pull request #1073 from mendableai/nsc/index-queue
(feat/index) Index/Insertion queue
|
2025-01-19 17:45:57 -03:00 |
|
Gergő Móricz
|
6637dce626
|
fix: status
|
2025-01-19 17:34:09 +01:00 |
|
Nicolas
|
92b8d97be3
|
Nick:
|
2025-01-19 13:09:29 -03:00 |
|
Nicolas
|
513f61a2d1
|
Nick: map improvements
|
2025-01-19 12:33:44 -03:00 |
|
Nicolas
|
c19af6ef42
|
Update map.ts
|
2025-01-19 12:27:08 -03:00 |
|
Nicolas
|
2e5785d8d9
|
Nick: fetch sitemap timeout param
|
2025-01-19 11:40:13 -03:00 |
|
Nicolas
|
34b40f6a23
|
Nick:
|
2025-01-18 17:17:42 -03:00 |
|
Nicolas
|
260a726f37
|
Merge branch 'main' into nsc/llm-usage-extract
|
2025-01-17 23:02:12 -03:00 |
|
Gergő Móricz
|
078c0679aa
|
fix(crawl-status): improve finished checking
|
2025-01-17 17:18:36 +01:00 |
|
Gergő Móricz
|
e6531278f6
|
feat(v1): crawl/batch scrape errors route
|
2025-01-17 17:12:04 +01:00 |
|
Gergő Móricz
|
23bb172592
|
fix(crawler): recognize sitemaps in robots.txt
|
2025-01-17 15:45:52 +01:00 |
|
Gergő Móricz
|
655753cd27
|
fix(url): allow domains with ports
|
2025-01-16 16:30:14 +01:00 |
|
Nicolas
|
4db023280d
|
Nick: introduce llm-usage cost analysis
|
2025-01-15 21:01:29 -03:00 |
|
Gergő Móricz
|
dde3aebac4
|
fix(v1/crawl-status): fix stuck on 0 jobs
|
2025-01-15 18:51:39 +01:00 |
|
Nicolas
|
033e9bbf29
|
Nick: __experimental_streamSteps
|
2025-01-14 01:45:50 -03:00 |
|
Nicolas
|
5e5b5ee0e2
|
(feat/extract) New re-ranker + multi entity extraction (#1061)
* agent that decides if splits schema or not
* split and merge properties done
* wip
* wip
* changes
* ch
* array merge working!
* comment
* wip
* dereferentiate schema
* dereference schemas
* Nick: new re-ranker
* Create llm-links.txt
* Nick: format
* Update extraction-service.ts
* wip: cooking schema mix and spread functions
* wip
* wip getting there!!!
* nick:
* moved functions to helpers
* nick:
* cant reproduce the error anymore
* error handling all scrapes failed
* fix
* Nick: added the sitemap index
* Update sitemap-index.ts
* Update map.ts
* deduplicate and merge arrays
* added error handler for object transformations
* Update url-processor.ts
* Nick:
* Nick: fixes
* Nick: big improvements to rerank of multi-entity
* Nick: working
* Update reranker.ts
* fixed transformations for nested objs
* fix merge nulls
* Nick: fixed error piping
* Update queue-worker.ts
* Update extraction-service.ts
* Nick: format
* Update queue-worker.ts
* Update pnpm-lock.yaml
* Update queue-worker.ts
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Thomas Kosmas <thomas510111@gmail.com>
|
2025-01-13 22:30:15 -03:00 |
|
Gergő Móricz
|
5c62bb1195
|
feat: new snips test framework (FIR-414) (#1033)
* feat: new snips test framework
* Update mock.ts
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
|
2025-01-13 20:50:47 +01:00 |
|
Nicolas
|
f4d10c5031
|
Nick: formatting fixes
|
2025-01-10 18:35:10 -03:00 |
|
Gergő Móricz
|
d1f3b96388
|
feat: add scrapeId in document.metadata
|
2025-01-09 20:52:12 +01:00 |
|
Gergő Móricz
|
29c1f126ab
|
feat(scrape-status): adapt
|
2025-01-09 19:14:00 +01:00 |
|
Nicolas
|
f82a742cd1
|
Merge pull request #1044 from mendableai/nsc/extract-queue
(feat/extract) Move extract to a queue system
|
2025-01-07 18:10:46 -03:00 |
|
Nicolas
|
b98e289f03
|
Nick:
|
2025-01-07 17:49:21 -03:00 |
|
Nicolas
|
9ec08d7020
|
Nick: fixed the sdks
|
2025-01-07 17:20:49 -03:00 |
|
Nicolas
|
dd14744850
|
Update types.ts
|
2025-01-07 16:55:55 -03:00 |
|
Nicolas
|
11af214db1
|
Nick: update extract in case there is an error
|
2025-01-07 16:21:51 -03:00 |
|
Nicolas
|
eb254547e5
|
Nick:
|
2025-01-07 16:16:01 -03:00 |
|
Gergő Móricz
|
ccfada98ca
|
various queue fixes
|
2025-01-07 19:15:23 +01:00 |
|
Nicolas
|
86e34d7c6c
|
Nick: wip
|
2025-01-07 12:13:12 -03:00 |
|
Móricz Gergő
|
b96b97ed72
|
fix(crawl): don't push rawhtml to db unless requested
|
2025-01-07 10:09:15 +01:00 |
|
Nicolas
|
bb27594443
|
Merge branch 'main' into nsc/extract-queue
|
2025-01-06 13:01:15 -03:00 |
|
Gergő Móricz
|
461842fe8c
|
fix(v1/crawl-status): handle job's returnvalue being explicitly null (db race)
|
2025-01-04 17:24:33 +01:00 |
|
Gergő Móricz
|
b92a4eb79b
|
fix(queue-worker): only do redirect handling logic on crawls, not batch scrape
|
2025-01-04 16:59:35 +01:00 |
|
Nicolas
|
27457ed5db
|
Nick: init
|
2025-01-03 20:44:27 -03:00 |
|
Nicolas
|
ad49503f8a
|
Update search.ts
|
2025-01-02 21:15:47 -03:00 |
|
Nicolas
|
cbe0716439
|
Update search.ts
|
2025-01-02 21:13:24 -03:00 |
|
Nicolas
|
e37ab8431a
|
Update search.ts
|
2025-01-02 21:07:14 -03:00 |
|
Nicolas
|
8b64e915b3
|
Update search.ts
|
2025-01-02 21:02:55 -03:00 |
|
Nicolas
|
7ce780ac81
|
Update search.ts
|
2025-01-02 20:40:38 -03:00 |
|
Nicolas
|
21bf89b6cc
|
Update search.ts
|
2025-01-02 19:57:51 -03:00 |
|
Nicolas
|
22ae1730bd
|
Update search.ts
|
2025-01-02 19:57:41 -03:00 |
|