2271 Commits

Author SHA1 Message Date
Gergő Móricz
ed929221ab feat(sitemap): switch around engine order 2025-01-22 19:10:27 +01:00
Gergő Móricz
5a039e7b64 fix(v1/map): add wrapper around tryGetSitemap 2025-01-22 19:00:46 +01:00
Nicolas
5aad21b35a
Update extract.ts 2025-01-22 11:01:10 -03:00
Nicolas
04916f17e2 Nick: bug fixes + acuc fixes + cache fixes 2025-01-21 19:17:06 -03:00
Nicolas
3604f2a3ae Nick: misc improvements 2025-01-21 16:57:45 -03:00
Nicolas
ac0d10c451 Nick: sitemap fetch only below threshold for /map 2025-01-21 16:28:57 -03:00
Nicolas
c7b219169b Nick: fixed crawl maps index dedup 2025-01-21 16:22:27 -03:00
Nicolas
720a429115 Nick: temp fix 2025-01-21 13:23:34 -03:00
Nicolas
2b9f63cf10 Nick: more permissive re-ranker 2025-01-21 11:30:54 -03:00
Gergő Móricz
dcbe0b319c fix(v1/crawl-status-ws): wait to send catchup before closing 2025-01-20 20:01:27 +01:00
Nicolas
ef69b1ac88 Nick: allowExternalLinks is now enableWebSearch 2025-01-20 13:41:30 -03:00
Nicolas
5030fea634 Update document-scraper.ts 2025-01-20 13:28:59 -03:00
Móricz Gergő
2d4f4de0ab fix(credit_billing): logs 2025-01-20 10:16:47 +01:00
Móricz Gergő
ae0d705f5d fix(v0/crawl): force kickoff 2025-01-20 09:55:00 +01:00
Móricz Gergő
2cf7a4f57a fix(batch-scrape): auto finish "kickoff" (no kickoff) 2025-01-20 09:40:59 +01:00
Nicolas
f385b250be Update html-to-markdown.ts 2025-01-20 00:20:20 -03:00
Nicolas
240e4e4702 Update auth.ts 2025-01-19 23:17:12 -03:00
Nicolas
1ca50e6e8f Update llmExtract.ts 2025-01-19 22:18:51 -03:00
Nicolas
d786949639 Reapply "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
This reverts commit 8b17af40018688c34f95727ceaec289b02ab2023.
2025-01-19 22:04:12 -03:00
Nicolas
8b17af4001 Revert "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
This reverts commit 406f28c04aff2ba3ae65f483627da13f02943cc3, reversing
changes made to 34ad9ec25d73f37deb1e3adec2315a121ec52f0e.
2025-01-19 22:00:28 -03:00
Nicolas
406f28c04a
Merge pull request #1068 from mendableai/nsc/llm-usage-extract
(feat/extract) - LLMs usage analysis + billing
2025-01-19 21:36:33 -03:00
Nicolas
02dea23892 Update auth.ts 2025-01-19 21:35:32 -03:00
Nicolas
34ad9ec25d
Merge pull request #1073 from mendableai/nsc/index-queue
(feat/index) Index/Insertion queue
2025-01-19 17:45:57 -03:00
Gergő Móricz
6637dce626 fix: status 2025-01-19 17:34:09 +01:00
Nicolas
e4b45e9e7c Update auth.ts 2025-01-19 13:23:51 -03:00
Nicolas
baa2f94765 Update crawl-maps-index.ts 2025-01-19 13:15:20 -03:00
Nicolas
92b8d97be3 Nick: 2025-01-19 13:09:29 -03:00
Nicolas
513f61a2d1 Nick: map improvements 2025-01-19 12:33:44 -03:00
Nicolas
c19af6ef42 Update map.ts 2025-01-19 12:27:08 -03:00
Nicolas
2e5785d8d9 Nick: fetch sitemap timeout param 2025-01-19 11:40:13 -03:00
Nicolas
24ddcd4a6d Update check-fire-engine.ts 2025-01-18 23:53:33 -03:00
Nicolas
382476cb36 Nick: auth extract 2025-01-18 23:16:25 -03:00
Nicolas
81c347f538 Update llmExtract.ts 2025-01-18 22:49:03 -03:00
Nicolas
64607f3f20 Update extraction-service.ts 2025-01-18 22:40:53 -03:00
Nicolas
b8a30a50e2 Update llm-cost.ts 2025-01-18 21:25:25 -03:00
Nicolas
0ec52613e2 Nick: 2025-01-18 21:10:11 -03:00
Nicolas
34b40f6a23 Nick: 2025-01-18 17:17:42 -03:00
Nicolas
9cd48d7f73 Nick: 2025-01-17 23:47:22 -03:00
Nicolas
260a726f37 Merge branch 'main' into nsc/llm-usage-extract 2025-01-17 23:02:12 -03:00
Nicolas
6e3ceccb5c Nick: fixed billing and acuc cache 2025-01-17 21:27:56 -03:00
Nicolas
1f6abf95e8 Nick: extract billing works 2025-01-17 20:59:44 -03:00
Gergő Móricz
dbc6d07871 fix(queue-worker): bring done add to earlier 2025-01-17 17:46:29 +01:00
Gergő Móricz
13abb2bc0e fix(crawl-redis/finishCrawl): increase logging to hunt down race condition 2025-01-17 17:23:13 +01:00
Gergő Móricz
078c0679aa fix(crawl-status): improve finished checking 2025-01-17 17:18:36 +01:00
Gergő Móricz
e6531278f6 feat(v1): crawl/batch scrape errors route 2025-01-17 17:12:04 +01:00
Gergő Móricz
dcd3d6d98d fix(kickoff): mark as finished if it errors out 2025-01-17 17:11:19 +01:00
Gergő Móricz
5992c57158 fix(crawler): bad urls from sitemap 2025-01-17 17:07:44 +01:00
Gergő Móricz
237d0dc197 fix(requests.http): map 2025-01-17 16:21:57 +01:00
Gergő Móricz
d5929af010 fix(queue-worker/kickoff): make crawls wait for kickoff to finish (matters on big sitemapped sites) 2025-01-17 16:04:01 +01:00
Gergő Móricz
23bb172592 fix(crawler): recognize sitemaps in robots.txt 2025-01-17 15:45:52 +01:00