98 Commits

Author SHA1 Message Date
rafaelsideguide
9ad06fdf56 added fire-engine fallback for getting sitemaps 2024-07-09 16:07:53 -03:00
Rafael Miller
f0f449fe51
Merge pull request #336 from snippet/allow-external-content-links
[Proposal] new feature allowExternalContentLinks
2024-07-02 09:45:21 -03:00
Jeff Pereira
a5fb45988c new feature allowExternalContentLinks 2024-06-28 17:23:40 -07:00
Eric Ciarla
70fcf2ce03 init 2024-06-28 16:39:09 -04:00
rafaelsideguide
3ebdf93342 removed console.logs 2024-06-24 16:43:12 -03:00
rafaelsideguide
21d29de819 testing crawl with new.abb.com case
many unnecessary console.logs for tracing the code execution
2024-06-24 16:25:07 -03:00
Eric Ciarla
34e37c5671 Add unit tests to replace e2e 2024-06-15 16:43:37 -04:00
Eric Ciarla
a6b7197737 Fix for maxDepth 2024-06-14 19:40:37 -04:00
Eric Ciarla
2c5f5c0ea2
Merge branch 'main' into feat/maxDepthRelative 2024-06-14 11:49:12 -04:00
Rafael Miller
f9c7ca9388
Merge branch 'main' into feat/issue-266 2024-06-14 11:47:58 -03:00
Rafael Miller
3e2e76311c
Merge branch 'main' into feat/issue-205 2024-06-14 11:25:20 -03:00
Eric Ciarla
59451754f5 Add tests 2024-06-14 10:14:07 -04:00
Eric Ciarla
71c98d8b80 Update logic 2024-06-13 18:00:52 -04:00
Eric Ciarla
095951aa4d Update test 2024-06-13 17:40:00 -04:00
Eric Ciarla
5e8aa92788 Update index.ts 2024-06-13 17:33:13 -04:00
Eric Ciarla
65d63bae45 Update index.ts 2024-06-13 17:17:44 -04:00
Eric Ciarla
32e814bedc Update index.ts 2024-06-13 17:02:30 -04:00
rafaelsideguide
bb859ae9a7 Added metadata.pageStatusCode and metadata.pageError properties to the responses 2024-06-13 17:08:40 -03:00
rafaelsideguide
676d6e8ab5 Added pageOptions.removeTags 2024-06-13 10:51:05 -03:00
rafaelsideguide
e37d151404 added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
2024-06-12 15:06:47 -03:00
rafaelsideguide
dc6acbf1f0 Merge remote-tracking branch 'origin/main' into feat/allowbackwardcrawling-option 2024-06-12 11:01:05 -03:00
Nicolas
520739c9f4 Nick: fixed bugs associated with absolute path replacements 2024-06-11 12:43:16 -07:00
rafaelsideguide
ee282c3d55 Added allowBackwardCrawling option 2024-06-11 15:24:39 -03:00
Nicolas
f6b06ac27a Nick: ignoreSitemap, better crawling algo 2024-06-10 18:12:41 -07:00
Nicolas
3091f0134c Nick: 2024-06-10 16:27:10 -07:00
Nicolas
b4c6819a54 Nick: 2024-06-05 11:11:09 -07:00
Rafael Miller
02fe470e20
Merge pull request #148 from mendableai/nsc/improvemnts-fixes-misc
Better fallbacks for initial crawl start
2024-06-04 14:31:10 -03:00
rafaelsideguide
6920ec8a61 bugfixing. already on main 2024-06-04 11:05:50 -03:00
Nicolas
918059ee9e Merge branch 'main' into nsc/improvemnts-fixes-misc 2024-06-03 16:46:02 -07:00
Nicolas
df6c3d1e7d Merge branch 'main' into detect-pdfs 2024-05-17 09:55:51 -07:00
Nicolas
9d635cb2a3 Nick: docx support 2024-05-16 11:48:02 -07:00
Nicolas
098db17913 Update index.ts 2024-05-15 17:37:09 -07:00
Nicolas
6ca368327f Merge branch 'main' into test/crawl-options 2024-05-15 17:18:25 -07:00
Nicolas
ade4e05cff Nick: working 2024-05-15 17:13:04 -07:00
Nicolas
bfccaf670d Nick: fixes most of it 2024-05-15 15:30:37 -07:00
rafaelsideguide
d91043376c not working yet 2024-05-15 18:54:40 -03:00
rafaelsideguide
fa014defc7 Fixing child links only bug 2024-05-15 18:35:09 -03:00
Nicolas
2ba743fb1a
Merge pull request #27 from eltociear/patch-1
refactor: fix typo in WebScraper/index.ts
2024-05-15 13:28:38 -07:00
Nicolas
1b0d6341d3 Update index.ts 2024-05-15 11:48:12 -07:00
Nicolas
d10f81e7fe Nick: fixes 2024-05-15 11:28:20 -07:00
Nicolas
87570bdfa1 Update index.ts 2024-05-15 11:06:03 -07:00
Ikko Eltociear Ashimine
e91c122c69
Merge branch 'main' into patch-1 2024-05-15 12:14:52 +09:00
Nicolas
a0fdc6f7c6 Nick: 2024-05-14 12:12:40 -07:00
Nicolas
7f31959be7 Nick: 2024-05-14 12:04:36 -07:00
Nicolas
8a72cf556b Nick: 2024-05-13 21:10:58 -07:00
Nicolas
26a092f780 Update index.ts 2024-05-13 21:04:49 -07:00
Nicolas
8101cbee37 Update index.ts 2024-05-13 21:02:47 -07:00
Nicolas
86b8439844 Nick: 2024-05-13 20:51:42 -07:00
Nicolas
a96fc5b96d Nick: 4x speed 2024-05-13 20:45:11 -07:00
rafaelsideguide
8eb2e95f19 Cleaned up 2024-05-13 16:13:10 -03:00