303 Commits

Author SHA1 Message Date
Nicolas
47123be783 Nick: weird activity block 2024-08-16 22:01:56 -04:00
Nicolas
81b2479db3
Merge pull request #459 from mendableai/feat/queue-scrapes
feat: Move scraper to queue
2024-08-15 14:19:55 -04:00
Nicolas
86326f34e9 Update single_url.test.ts 2024-08-15 13:48:42 -04:00
Gergő Móricz
29f0d9ec94 propagate priority to fire-engine 2024-08-15 19:04:46 +02:00
Nicolas
6e1074cdd1 Update website_params.ts 2024-08-14 17:39:54 -04:00
Thomas Kosmas
6410e1a81d Update params 2024-08-15 00:10:14 +03:00
Gergo Moricz
d7549d4dc5 feat: remove webScraperQueue 2024-08-13 21:03:24 +02:00
Gergő Móricz
4a2c37dcf5
Merge branch 'main' into feat/queue-scrapes 2024-08-13 20:53:49 +02:00
Gergo Moricz
86e136beca feat: crawl to scrape conversion 2024-08-13 20:51:43 +02:00
Thomas Kosmas
98be29c963 Update parameters for platform.openai.com 2024-08-12 22:49:28 +03:00
rafaelsideguide
0591000b64 bugfix includes excludes 2024-08-09 14:30:41 -03:00
Nicolas
f1f5605010 Update website_params.ts 2024-08-08 12:31:58 -04:00
Gergő Móricz
5fc7fcb77c
Merge branch 'main' into feat/queue-scrapes 2024-08-07 16:35:44 +02:00
Gergo Moricz
fe9fdb578b revert bad hotfixes 2024-08-07 16:34:25 +02:00
Gergo Moricz
7bb922071c fix(queue-worker): manually renew lock (testing) 2024-08-07 14:35:20 +02:00
Nicolas
3321ca9398
Merge pull request #504 from mendableai/feat/fullpage-screenshot
[Feat] Added fullpagescreenshot capabilities
2024-08-06 13:52:29 -04:00
Gergo Moricz
b60ee30dba fix(single_url): accept 500 2024-08-06 18:00:56 +02:00
rafaelsideguide
4d24a99d50 fix params 2024-08-06 09:34:43 -03:00
rafaelsideguide
3edc3a3d15 added fullpagescreenshot capabilities, wip on fire-engine side 2024-08-05 18:17:37 -03:00
rafaelsideguide
f32e8de156 fixes the empty excludes.filter undefined bug 2024-08-05 18:13:31 -03:00
Nicolas
1742e4ceae Nick: 2024-08-02 19:25:15 -04:00
Nicolas
b448e3c3ad Update website_params.ts 2024-08-02 14:26:35 -04:00
rafaelsideguide
4051630632 Update sitemap.ts 2024-08-02 11:32:48 -03:00
rafaelsideguide
8568b61015 bugfix for sitemaps 2024-08-02 11:03:01 -03:00
Nicolas
af68b7a785
Merge pull request #475 from mendableai/bugfix/issue-466
[Bug] pdfs and logging pdf events, also added trycatchs for docx
2024-08-01 22:05:26 -04:00
rafaelsideguide
f48ff36b32 added .inc files and forced lower case comparison 2024-07-31 09:28:43 -03:00
Nicolas
ad6f6eff4b Update fireEngine.ts 2024-07-30 19:15:54 -04:00
Nicolas
6d99dedd3c Nick: fixed tests 2024-07-30 19:11:01 -04:00
rafaelsideguide
d25d7e7244 special case: developer.apple.com 2024-07-30 10:13:09 -03:00
Nicolas
5e8ffcf505 Update website_params.ts 2024-07-29 20:43:47 -04:00
Nicolas
7b813883ef Nick: first layer 2024-07-29 20:31:51 -04:00
Nicolas
968a2dc753 Nick: 2024-07-29 18:37:09 -04:00
rafaelsideguide
49e3e64787 bugfix for pdfs and logging pdf events, also added trycatchs for docx 2024-07-29 14:13:46 -03:00
Nicolas
4c9d62f6d3 Nick: fixing sitemap fallback 2024-07-26 18:25:44 -04:00
Nicolas
cb97871ff9 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-26 17:21:11 -04:00
Nicolas
ff4266f09e Update pdfProcessor.ts 2024-07-26 17:21:09 -04:00
rafaelsideguide
96cec2a673 fix checking scrape log success content length 2024-07-26 12:00:52 -03:00
Nicolas
f82ca3be17 Nick: 2024-07-25 19:53:29 -04:00
Nicolas
01fab6e036 Update single_url.ts 2024-07-25 17:51:41 -04:00
Nicolas
56042d090c Update single_url.ts 2024-07-25 17:48:44 -04:00
Nicolas
3242872503 Update single_url.ts 2024-07-25 17:43:55 -04:00
Nicolas
e5b797549e Merge branch 'main' into feat/scrape-monitoring 2024-07-25 16:21:02 -04:00
rafaelsideguide
e720e1bacf Merge remote-tracking branch 'origin/main' into feat/logger 2024-07-25 09:49:27 -03:00
rafaelsideguide
309728a482 updated logs 2024-07-25 09:48:06 -03:00
Nicolas
2c1221750b
Merge pull request #449 from mendableai/bugfix/malformed-url-sitemap
Added regex for links in sitemap
2024-07-24 20:37:35 -04:00
Nicolas
3a1b8a9797 Update website_params.ts 2024-07-24 11:04:47 -04:00
Nicolas
8b48ec8d30 Update website_params.ts 2024-07-24 11:02:20 -04:00
Gergo Moricz
4d35ad073c feat(monitoring/scrape): include url, worker, response_size 2024-07-24 16:43:39 +02:00
Gergo Moricz
64bcedeefc fix(monitoring): bad success check on scrape 2024-07-24 16:21:59 +02:00
Gergo Moricz
7cd9bf92e3 feat: scrape event logging to DB 2024-07-24 14:31:25 +02:00