Gergő Móricz
11ed679274
feat(scrapeURL/pdf): support PDF prefetch when parsePDF is off
2025-02-20 09:28:13 +01:00
Gergő Móricz
55d047b6b3
feat(scrapeURL): handle PDFs behind anti-bot ( #1198 )
2025-02-20 04:11:30 +01:00
Gergő Móricz
46b187bc64
feat(v1/map): stop mapping if timed out via AbortController ( #1205 )
2025-02-20 00:42:13 +01:00
Gergő Móricz
2200f084f3
SELFHOST FIXES ( #1207 )
...
* fix(extract): construct OpenAI on demand
Fixes hard-crash if api key not specified in a self-hosting environment.
* fix(ci): try sleeping
* fix(ci): override host
* fix(ci): wait for server to start
* Support /extract and /crawl for self-hosted (FIR-1097) (#1137 )
* Support /extract for self-hosted
This returns the job response from redis rather than supabase when db auth is disabled (self hosted mode)
* Use getJob for extract and use correct types
* fix(v1/crawl-status): only poll DB for total count if DB is enabled
* feat(snips): TEST_SUITE_SELF_HOSTED
* fix(ci/test-server-self-host): use pr trigger
* fix(scrapeURL): f-e mocking in selfhosted env
* fix(snips): do not try to eval json format on selfhost
* fix(scrapeURL): further f-e mocking
* fix(snips): don't timeout on hard fail polling
* fix(v1/extract-status): fix-up the db-agnostic impl
unfortunately had to separate the functions since the schema
was too divergent :(
* fix(snips): boost screenshot delay
* feat(ci): test with openai
* feat(ci): extract, search testing
* fix(ci): matrix
* fix(ci): bleh
* Update: fix default google search (#1174 )
* fix log title
* search should always work
* asd
* fix ci
---------
Co-authored-by: Nick Roth <nlr06886@gmail.com>
Co-authored-by: William <sdustusun@gmail.com>
2025-02-20 00:41:22 +01:00
Gergő Móricz
b136e42b53
feat(v1): proxy option / stealthProxy flag (FIR-1050) ( #1196 )
...
* feat(v1): proxy option / stealthProxy flag
* feat(js-sdk): add proxy option
2025-02-18 18:03:10 +01:00
Gergő Móricz
5c4021e8cf
fix(scrapeURL/fire-engine): screenshot broken hotfix
2025-02-17 18:59:16 +01:00
Gergő Móricz
445b906af1
fix(scrapeURL/fire-engine): perform format screenshot after specified actions ( #1192 )
2025-02-17 12:35:55 -03:00
Gergő Móricz
1491b5b141
fix(scrapeURL/sb): enforce timeout (FIR-980) ( #1183 )
...
* fix(scrapeURL/scrapingbee): enforce timeout
* fix(scrapeURL/sb): types
* fix the test
* fixup: remove nix files
2025-02-16 15:55:03 +01:00
Gergő Móricz
892f3a41f3
fix(scrape): allow getting valid JSON via rawHtml (FIR-852) ( #1138 )
...
* fix(scrape): allow getting valid JSON via rawHtml
* fix(scrape/test):
2025-02-06 18:35:28 +01:00
Nicolas
6bfd24d903
Nick: waitFor fixes
2025-01-30 23:23:03 -03:00
Gergő Móricz
d09e0603f8
feat(scrapeUrl/fire-engine): add blockAds flag (FIR-692) ( #1106 )
...
* feat(scrapeUrl/fire-engine): add blockAds flag
* feat(v1/scrape): blockAds test
2025-01-29 15:03:37 +01:00
Móricz Gergő
bee2b2873e
fix(sitemap): better ordering
2025-01-23 08:58:18 +01:00
Nicolas
498558d358
Nick: formatting done
2025-01-22 18:47:44 -03:00
Nicolas
04916f17e2
Nick: bug fixes + acuc fixes + cache fixes
2025-01-21 19:17:06 -03:00
Gergő Móricz
5c62bb1195
feat: new snips test framework (FIR-414) ( #1033 )
...
* feat: new snips test framework
* Update mock.ts
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-01-13 20:50:47 +01:00
Nicolas
f4d10c5031
Nick: formatting fixes
2025-01-10 18:35:10 -03:00
Móricz Gergő
3c614a2e5c
fix(scrapeURL/engines/pdf,docx): support authorization
2025-01-09 10:03:27 +01:00
Nicolas
aef040b41e
Nick: from cache fixes
2025-01-03 23:07:15 -03:00
Nicolas
a4f7c38834
Nick: fixed
2025-01-03 22:15:23 -03:00
Nicolas
6b2e1cbb28
Nick: cache /extract scrapes
2025-01-03 21:19:40 -03:00
Nicolas
c3fd13a82b
Nick: fixed re-ranker and enabled url cache of 2hrs
2024-12-31 18:06:07 -03:00
Gergő Móricz
4d1f92f4c8
fix(scrapeURL/fetch): block loopback and link-local IPs
2024-12-29 17:35:14 +01:00
Nicolas
e255301005
Update index.ts
2024-12-27 21:31:29 -03:00
Nicolas
1eca61bffb
Update index.ts
2024-12-27 20:59:18 -03:00
Nicolas
f9d55efba8
Update index.ts
2024-12-27 20:54:26 -03:00
Nicolas
b8d7f9f257
Nick: we are using runpod
2024-12-27 19:59:05 -03:00
Nicolas
5fcf3fa97e
Merge branch 'main' into mog/mineru
2024-12-27 19:53:09 -03:00
Gergő Móricz
4772951313
feat(scrapeURL/fire-engine): explicitly delete job after scrape
2024-12-27 16:44:41 +01:00
Gergő Móricz
0b55fb836b
feat(scrapeURL/pdf): switch to MinerU
2024-12-27 16:37:32 +01:00
Gergő Móricz
c543f4f76c
feat(scrapeURL/pdf): update mock Blob implementation to pass TypeScript
2024-12-26 20:31:51 +01:00
Gergő Móricz
f15ef0e758
feat(scrapeURL/fire-engine/chrome-cdp): handle file downloads
2024-12-26 20:29:09 +01:00
Gergő Móricz
071b9a01c3
fix(scrapeURL/fire-engine): pass geolocation
2024-12-19 18:23:21 +01:00
Nicolas
3b6edef9fa
chore: formatting
2024-12-17 16:58:57 -03:00
Eric Ciarla
a20a003c74
revert to pdf parse
2024-12-17 12:12:22 -05:00
Eric Ciarla
1402831a0a
Replace pdf parse with pdf to md
2024-12-17 09:59:52 -05:00
Eric Ciarla
ed7d15d2af
Update index.ts
2024-12-17 09:50:29 -05:00
Gergő Móricz
47b968fede
fix(scrapeURL/fire-engine): timeout calculation issues
2024-12-17 13:17:55 +01:00
Nicolas
1214d219e1
Nick: fix actions errors
2024-12-15 15:43:12 -03:00
Gergő Móricz
0f3a27bf27
fix(scrapeURL/engines): better timeouts
2024-12-15 18:58:29 +01:00
Nicolas
a5256827c0
Update index.ts
2024-12-15 14:36:09 -03:00
Gergő Móricz
b4a5e1a6e9
fix(scrapeURL/fire-engine): timeout handling
2024-12-15 16:04:17 +01:00
Gergő Móricz
afbd01299a
fix(scrapeURL/fire-engine): timeouts
2024-12-15 15:58:27 +01:00
Nicolas
8a1c404918
Nick: revert trailing comma
2024-12-11 19:51:08 -03:00
Nicolas
00335e2ba9
Nick: fixed prettier
2024-12-11 19:46:11 -03:00
Gergő Móricz
d276a23da0
fix(scrapeURL/pdf): handle if a presumed PDF link returns HTML (e.g. 404)
2024-12-10 23:24:33 +01:00
Gergő Móricz
ecad76978d
feat(scrapeURL/pdf): extend amount of time we're willing to wait for PDFs in crawl/batch scrape mode
2024-12-10 21:43:00 +01:00
Gergő Móricz
85cbfbb5bb
fix(crawl): disable smart wait
...
This increases the reliability/deterministic-ness of crawls.
2024-12-10 21:12:31 +01:00
Nicolas
4d2f4aad11
Update index.ts
2024-12-03 21:07:45 -03:00
Nicolas
f3aa32863f
Revert "Merge branch 'nsc/crawl-n--1-fixes'"
...
This reverts commit 6d325b7ce7af912b326369eace62f89f897b536b, reversing
changes made to 3d5704b73e6c4802f0344dc2e17042af9b6de0f5.
2024-12-03 20:53:14 -03:00
Nicolas
64800a1c02
Nick: rm fe for test
2024-12-03 20:34:14 -03:00