474 Commits

Author SHA1 Message Date
Nicolas
819ad50af3 Update fireEngine.ts 2024-08-20 21:16:33 -03:00
rafaelsideguide
e9d6ca197e tests passing now 2024-08-20 20:00:41 -03:00
Nicolas
1b3ad60a2c Reapply "Merge pull request #561 from mendableai/bug/dealing-with-dns-error"
This reverts commit ffe11a5bf73e3c57657972cd36c3af1d0b9a432c.
2024-08-20 19:22:09 -03:00
Nicolas
441628998f Reapply "Merge pull request #561 from mendableai/bug/dealing-with-dns-error"
This reverts commit ffe11a5bf73e3c57657972cd36c3af1d0b9a432c.
2024-08-20 19:16:48 -03:00
Nicolas
ffe11a5bf7 Revert "Merge pull request #561 from mendableai/bug/dealing-with-dns-error"
This reverts commit 2030ec603109d6ce8786a011d431bc5c83917f1b, reversing
changes made to f494d2b707d40b690ae41611d17f77f683570fc2.
2024-08-20 18:16:11 -03:00
Gergő Móricz
1368f9a87f fix: treat existing screenshot as a scraper success condition 2024-08-20 22:24:18 +02:00
rafaelsideguide
f98be7d94e Update fireEngine.ts 2024-08-20 16:53:01 -03:00
rafaelsideguide
1f27182a13 added try catch 2024-08-20 15:42:39 -03:00
rafaelsideguide
e326249a57 added check job and cancel to fire-engine requests 2024-08-20 14:26:42 -03:00
rafaelsideguide
e1c9cbf709 bug fixed. crawl should not stop if sitemap url is invalid 2024-08-20 09:11:58 -03:00
rafaelsideguide
ecd472356b added variables to beta customers 2024-08-19 16:41:54 -03:00
rafaelsideguide
b8170aaa47 Update blocklist.ts 2024-08-19 08:51:48 -03:00
Nicolas
47123be783 Nick: weird activity block 2024-08-16 22:01:56 -04:00
rafaelsideguide
086ba6280b fixed markdown format 2024-08-16 18:39:13 -03:00
Gergő Móricz
aabfaf0ac5 clean up crawl-status, fix db ddos 2024-08-16 23:29:39 +02:00
rafaelsideguide
7a61325500 map + search + scrape markdown bug 2024-08-16 17:57:11 -03:00
Nicolas
23a033fe61 Nick: fixes and more e2e tests 2024-08-16 16:03:35 -04:00
rafaelsideguide
3f998b688d scrape ready 2024-08-16 15:14:37 -03:00
Nicolas
81b2479db3
Merge pull request #459 from mendableai/feat/queue-scrapes
feat: Move scraper to queue
2024-08-15 14:19:55 -04:00
Nicolas
86326f34e9 Update single_url.test.ts 2024-08-15 13:48:42 -04:00
Gergő Móricz
29f0d9ec94 propagate priority to fire-engine 2024-08-15 19:04:46 +02:00
Nicolas
6e1074cdd1 Update website_params.ts 2024-08-14 17:39:54 -04:00
Thomas Kosmas
6410e1a81d Update params 2024-08-15 00:10:14 +03:00
Gergo Moricz
d7549d4dc5 feat: remove webScraperQueue 2024-08-13 21:03:24 +02:00
Gergő Móricz
4a2c37dcf5
Merge branch 'main' into feat/queue-scrapes 2024-08-13 20:53:49 +02:00
Gergo Moricz
86e136beca feat: crawl to scrape conversion 2024-08-13 20:51:43 +02:00
Rafael Miller
76160a38db
Update single_url.ts 2024-08-12 17:57:00 -03:00
Rafael Miller
7c339ea125
Update single_url.ts 2024-08-12 17:55:10 -03:00
Thomas Kosmas
98be29c963 Update parameters for platform.openai.com 2024-08-12 22:49:28 +03:00
rafaelsideguide
c3aeed510b Update single_url.ts 2024-08-12 16:40:31 -03:00
Kevin Swiber
ba2af74adf
Ensuring USE_DB_AUTHENTICATION is true in single URL scraper. 2024-08-09 15:29:18 -07:00
rafaelsideguide
0591000b64 bugfix includes excludes 2024-08-09 14:30:41 -03:00
Nicolas
f1f5605010 Update website_params.ts 2024-08-08 12:31:58 -04:00
Gergő Móricz
5fc7fcb77c
Merge branch 'main' into feat/queue-scrapes 2024-08-07 16:35:44 +02:00
Gergo Moricz
fe9fdb578b revert bad hotfixes 2024-08-07 16:34:25 +02:00
Gergo Moricz
7bb922071c fix(queue-worker): manually renew lock (testing) 2024-08-07 14:35:20 +02:00
Nicolas
3321ca9398
Merge pull request #504 from mendableai/feat/fullpage-screenshot
[Feat] Added fullpagescreenshot capabilities
2024-08-06 13:52:29 -04:00
Gergo Moricz
b60ee30dba fix(single_url): accept 500 2024-08-06 18:00:56 +02:00
rafaelsideguide
4d24a99d50 fix params 2024-08-06 09:34:43 -03:00
rafaelsideguide
3edc3a3d15 added fullpagescreenshot capabilities, wip on fire-engine side 2024-08-05 18:17:37 -03:00
rafaelsideguide
f32e8de156 fixes the empty excludes.filter undefined bug 2024-08-05 18:13:31 -03:00
Nicolas
1742e4ceae Nick: 2024-08-02 19:25:15 -04:00
Nicolas
b448e3c3ad Update website_params.ts 2024-08-02 14:26:35 -04:00
rafaelsideguide
4051630632 Update sitemap.ts 2024-08-02 11:32:48 -03:00
rafaelsideguide
8568b61015 bugfix for sitemaps 2024-08-02 11:03:01 -03:00
Nicolas
af68b7a785
Merge pull request #475 from mendableai/bugfix/issue-466
[Bug] pdfs and logging pdf events, also added trycatchs for docx
2024-08-01 22:05:26 -04:00
rafaelsideguide
f48ff36b32 added .inc files and forced lower case comparison 2024-07-31 09:28:43 -03:00
Nicolas
ad6f6eff4b Update fireEngine.ts 2024-07-30 19:15:54 -04:00
Nicolas
6d99dedd3c Nick: fixed tests 2024-07-30 19:11:01 -04:00
rafaelsideguide
d25d7e7244 special case: developer.apple.com 2024-07-30 10:13:09 -03:00