507 Commits

Author SHA1 Message Date
Nicolas
664ba69f08 Nick: f-eng monitoring test 2024-12-14 21:40:46 -03:00
Nicolas
ccbae4b155 Update auth.ts 2024-12-14 00:20:14 -03:00
Gergő Móricz
4b5014d7fe feat(v1/batch/scrape): add ignoreInvalidURLs option 2024-12-14 01:11:43 +01:00
Gergő Móricz
e74e4bcefc feat(runWebScraper): retry a scrape max 3 times in a crawl if the status code is failure 2024-12-14 00:54:05 +01:00
Nicolas
3b0d192d1b Update types.ts 2024-12-12 18:14:11 -03:00
Nicolas
e22a0b596c Nick: custom metadata 2024-12-12 13:30:00 -03:00
Nicolas
8a1c404918 Nick: revert trailing comma 2024-12-11 19:51:08 -03:00
Nicolas
52f2e733e2 Nick: fixes 2024-12-11 19:48:22 -03:00
Nicolas
00335e2ba9 Nick: fixed prettier 2024-12-11 19:46:11 -03:00
Gergő Móricz
85cbfbb5bb fix(crawl): disable smart wait
This increases the reliability/deterministic-ness of crawls.
2024-12-10 21:12:31 +01:00
Gergő Móricz
6776aee1c3 feat(auth): extend rate limiter logging to make it easier to debug 2024-12-09 19:29:32 +01:00
Nicolas
4d287bb77f Nick: moving acuc temp to read replica 2024-12-06 13:06:26 -03:00
Gergő Móricz
845c2744a9 feat(app): add extra crawl logging (app-side only for now) 2024-12-05 20:50:36 +01:00
Gergő Móricz
cce94289ee fix(v1/batch/scrape): horrid memory usage 2024-12-05 20:49:28 +01:00
Gergő Móricz
f8e619b5df fix(crawl-status): returnvalue filtering on active jobs 2024-12-05 18:20:21 +01:00
Gergő Móricz
41d859203f feat(v1/batch/scrape): appendToId 2024-12-04 23:35:29 +01:00
Gergő Móricz
7bde034020 auth: log team id 2024-12-04 23:12:55 +01:00
Nicolas
64546f1259 Update types.ts 2024-12-04 18:00:51 -03:00
Nicolas
f7207f91b4 Nick: temp e-s-1 2024-12-04 16:25:43 -03:00
Gergő Móricz
88a16b18a3 fix(crawl-status): ts error 2024-12-04 17:55:51 +01:00
Gergő Móricz
d8613899e3 fix(crawl-status): handle failed jobs (oops) 2024-12-04 17:52:47 +01:00
Gergő Móricz
712a138404 fix(crawl-status): hard error bug 2024-12-04 17:47:37 +01:00
Nicolas
52806807a1 Nick: crawl fixes 2024-12-03 16:25:55 -03:00
Nicolas
1477ab2359 Nick: log clear ACUC cache 2024-12-03 12:15:09 -03:00
Nicolas
4bb46ed152 Nick: extract prompt fixes and limit the number of urls 2024-12-01 20:29:03 -03:00
rafaelmmiller
5ddb7eb922 parameter 2024-11-29 16:44:54 -03:00
rafaelmmiller
943bbae88d fixed nested data inside extract 2024-11-27 18:29:37 -03:00
Nicolas
5522d6af7d Update extract.ts 2024-11-26 15:01:42 -03:00
Nicolas
8a26f08b14 Update extract.ts 2024-11-24 20:37:58 -08:00
Nicolas
2513efc971 Update extract.ts 2024-11-24 20:31:38 -08:00
Nicolas
30def84c0a Nick: scrape timeout + warnings 2024-11-24 19:44:51 -08:00
Nicolas
b693c6c23b Update extract.ts 2024-11-24 19:36:18 -08:00
Nicolas
6fbfeafe38 Nick: fixed map settings 2024-11-20 16:51:13 -08:00
Nicolas
aaddbdc1bc Update map.ts 2024-11-20 16:47:07 -08:00
Nicolas
c78dae178b Merge branch 'main' into nsc/new-extract 2024-11-20 16:41:13 -08:00
Nicolas
945183ffbd Update extract.ts 2024-11-20 16:40:55 -08:00
Nicolas
d196b9d93d Update extract.ts 2024-11-20 13:16:36 -08:00
Nicolas
9512d81e05 Update extract.ts 2024-11-20 13:15:52 -08:00
Nicolas
3de4997f4d Loggin num tokens 2024-11-20 13:09:46 -08:00
Nicolas
769f08c10d Billing and log for extract 2024-11-20 13:08:09 -08:00
Nicolas
0e4e9a3b37 Nick: 2024-11-20 13:01:36 -08:00
Nicolas
67a2989874 Nick: fixes 2024-11-20 12:48:10 -08:00
Gergő Móricz
79a75e088a feat(crawl): allowSubdomain 2024-11-19 18:38:59 +01:00
rafaelmmiller
53134b7c85 Rafa: removed throw error and added map to requests 2024-11-19 09:34:52 -03:00
rafaelmmiller
36cf49c959 Merge remote-tracking branch 'origin/main' into nsc/new-extract 2024-11-19 09:34:08 -03:00
rafaelmmiller
77e152cba8 added team_id to scrape-status endpoint 2024-11-18 15:02:00 -03:00
Gergő Móricz
1b032b05fa fix(map): make sitemapOnly simpler 2024-11-15 21:14:32 +01:00
Gergő Móricz
a4d3dba865 fix(map): ignore limit when using sitemapOnly 2024-11-15 21:03:20 +01:00
Gergő Móricz
7b02c45dd0 fix(v1/types): better timeout primitives 2024-11-15 19:35:54 +01:00
Gergő Móricz
c95a4a26c9 fix(v1/batch/scrape): raise default timeout 2024-11-15 18:58:03 +01:00