35 Commits

Author SHA1 Message Date
Nicolas
5abd26a267 Nick: set the crawl limit to the remaining credits 2024-08-20 14:16:54 -03:00
Nicolas
8e4ca86463 Update crawl.ts 2024-08-19 11:02:24 -03:00
Nicolas
36b35dbc67 Update crawl.ts 2024-08-19 11:01:26 -03:00
Gergő Móricz
dad9d353d9 use thomas's url validation 2024-08-15 19:19:02 +02:00
Gergő Móricz
c5597bc722 fix: robots.txt laoding 2024-08-15 19:11:07 +02:00
Gergő Móricz
57730f6a35 priority changes 2024-08-15 18:58:07 +02:00
Gergő Móricz
846610681b fix: fix posthog, add dummy crawl DB items 2024-08-15 18:55:18 +02:00
Gergő Móricz
b8ec40dd72 fix(crawl): submit sitemapped jobs in bulk 2024-08-14 20:34:19 +02:00
Gergő Móricz
2ca1017fc3 fix(crawl): make request 0 of crawl jobs higher priority 2024-08-14 19:34:18 +02:00
Gergo Moricz
d7549d4dc5 feat: remove webScraperQueue 2024-08-13 21:03:24 +02:00
Gergo Moricz
86e136beca feat: crawl to scrape conversion 2024-08-13 20:51:43 +02:00
Nicolas
7e002a8b06 Nick: bull mq 2024-07-30 13:27:23 -04:00
Nicolas
e5b797549e Merge branch 'main' into feat/scrape-monitoring 2024-07-25 16:21:02 -04:00
rafaelsideguide
309728a482 updated logs 2024-07-25 09:48:06 -03:00
Gergo Moricz
7cd9bf92e3 feat: scrape event logging to DB 2024-07-24 14:31:25 +02:00
rafaelsideguide
4381109dd8 added default values and fixed pdf bug 2024-06-26 09:00:54 -03:00
Rafael Miller
3e2e76311c
Merge branch 'main' into feat/issue-205 2024-06-14 11:25:20 -03:00
rafaelsideguide
676d6e8ab5 Added pageOptions.removeTags 2024-06-13 10:51:05 -03:00
rafaelsideguide
e37d151404 added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
2024-06-12 15:06:47 -03:00
rafaelsideguide
01c9f071fa fixed 2024-06-12 11:27:06 -03:00
rafaelsideguide
ee282c3d55 Added allowBackwardCrawling option 2024-06-11 15:24:39 -03:00
rafaelsideguide
184e4678f1 bugfix on idempotency key check 2024-05-23 11:47:04 -03:00
rafaelsideguide
3f460af6c5 Added idempotency key to crawl route 2024-05-07 15:29:27 -03:00
Nicolas
bdbee963f7 Merge branch 'main' into nsc/cancel-job 2024-05-07 10:13:43 -07:00
rafaelsideguide
e1f52c538f nested includeHtml inside pageOptions 2024-05-07 13:40:24 -03:00
Nicolas
6d5da358cc Nick: cancel job 2024-05-06 17:16:43 -07:00
rafaelsideguide
509250c4ef changed to includeHtml 2024-05-06 19:45:56 -03:00
rafaelsideguide
538355f1af Added toMarkdown option 2024-05-06 11:36:44 -03:00
Nicolas
f3c190c21c Nick: 2024-04-23 16:47:24 -07:00
rafaelsideguide
849c0b6ebf [Feat] Added blocklist for social media urls 2024-04-23 18:50:35 -03:00
Nicolas
898d729a84 Nick: tests 2024-04-21 11:27:31 -07:00
Nicolas
5cdbf3a0ac Nick: cleaner functions to handle authenticated requests that dont require ifs everywhere 2024-04-21 10:36:48 -07:00
Caleb Peffer
be75aaa195 Caleb: first version of supabase proxy to make db authentication optional 2024-04-21 09:31:22 -07:00
Nicolas
0db0874b00 Nick: 2024-04-20 19:37:45 -07:00
Nicolas
23b2190e5d Nick: 2024-04-20 16:38:05 -07:00