Nicolas
|
042f81ddf2
|
Update removeUnwantedElements.test.ts
|
2024-06-26 21:20:11 -03:00 |
|
Nicolas
|
388ce3cbce
|
Nick: small changes
|
2024-06-26 21:15:42 -03:00 |
|
Nicolas
|
1d4907acc9
|
Nick:
|
2024-06-26 21:02:58 -03:00 |
|
rafaelsideguide
|
5f69fc7677
|
Fixed the regex test
|
2024-06-25 18:24:01 -03:00 |
|
Nicolas
|
e7be17db92
|
Nick: metadata fixes and lock duration for bull decreased to 2 hrs
|
2024-06-25 15:21:14 -03:00 |
|
Nicolas
|
90b7fff366
|
Update crawler.ts
|
2024-06-24 16:52:01 -03:00 |
|
rafaelsideguide
|
3ebdf93342
|
removed console.logs
|
2024-06-24 16:43:12 -03:00 |
|
Nicolas
|
56d42d9c9b
|
Nick:
|
2024-06-24 16:33:07 -03:00 |
|
rafaelsideguide
|
21d29de819
|
testing crawl with new.abb.com case
many unnecessary console.logs for tracing the code execution
|
2024-06-24 16:25:07 -03:00 |
|
rafaelsideguide
|
9c539e9113
|
Fixed includeHTML to use cleanedHtml as response
|
2024-06-18 16:26:54 -03:00 |
|
Rafael Miller
|
f5a9acc4c6
|
Merge branch 'main' into feat/removeTags-regex
|
2024-06-18 14:39:59 -03:00 |
|
rafaelsideguide
|
9f7afd1e88
|
fix for some complex cases
|
2024-06-18 14:36:51 -03:00 |
|
rafaelsideguide
|
6c726a02eb
|
Moved to utils/removeUnwantedElements, added unit tests
|
2024-06-18 09:46:42 -03:00 |
|
AndyMik90
|
8b3c3aae91
|
Added support for RegEx in removeTags
|
2024-06-18 07:31:46 +02:00 |
|
rafaelsideguide
|
b2bd562bb2
|
transcribed from e2e to unit tests for many cases
|
2024-06-17 17:09:44 -03:00 |
|
Eric Ciarla
|
519ab1aecb
|
Update unit tests
|
2024-06-15 17:14:09 -04:00 |
|
Eric Ciarla
|
b1eb608295
|
Merge branch 'main' into feat/maxDepthRelative
|
2024-06-15 16:50:27 -04:00 |
|
Eric Ciarla
|
34e37c5671
|
Add unit tests to replace e2e
|
2024-06-15 16:43:37 -04:00 |
|
Eric Ciarla
|
a6b7197737
|
Fix for maxDepth
|
2024-06-14 19:40:37 -04:00 |
|
Nicolas
|
4ec863718b
|
Merge pull request #283 from mendableai/nsc/crawler-fixes
Fixes crawler getting confused with base paths that contain www.
|
2024-06-14 13:50:32 -07:00 |
|
Nicolas
|
e88cb314c8
|
Update crawler.ts
|
2024-06-14 13:44:54 -07:00 |
|
rafaelsideguide
|
ad7795f973
|
Merge remote-tracking branch 'origin/main' into test/load-testing
|
2024-06-14 15:14:01 -03:00 |
|
Eric Ciarla
|
2c5f5c0ea2
|
Merge branch 'main' into feat/maxDepthRelative
|
2024-06-14 11:49:12 -04:00 |
|
Rafael Miller
|
f9c7ca9388
|
Merge branch 'main' into feat/issue-266
|
2024-06-14 11:47:58 -03:00 |
|
Rafael Miller
|
3e2e76311c
|
Merge branch 'main' into feat/issue-205
|
2024-06-14 11:25:20 -03:00 |
|
Eric Ciarla
|
59451754f5
|
Add tests
|
2024-06-14 10:14:07 -04:00 |
|
rafaelsideguide
|
5dd18ca79b
|
fixed edge cases
|
2024-06-14 09:46:55 -03:00 |
|
Eric Ciarla
|
ab9de0f5ab
|
Update maxDepth tests
|
2024-06-13 18:46:30 -04:00 |
|
Eric Ciarla
|
71c98d8b80
|
Update logic
|
2024-06-13 18:00:52 -04:00 |
|
Eric Ciarla
|
095951aa4d
|
Update test
|
2024-06-13 17:40:00 -04:00 |
|
Eric Ciarla
|
5e8aa92788
|
Update index.ts
|
2024-06-13 17:33:13 -04:00 |
|
Eric Ciarla
|
65d63bae45
|
Update index.ts
|
2024-06-13 17:17:44 -04:00 |
|
Eric Ciarla
|
32e814bedc
|
Update index.ts
|
2024-06-13 17:02:30 -04:00 |
|
rafaelsideguide
|
bb859ae9a7
|
Added metadata.pageStatusCode and metadata.pageError properties to the responses
|
2024-06-13 17:08:40 -03:00 |
|
rafaelsideguide
|
676d6e8ab5
|
Added pageOptions.removeTags
|
2024-06-13 10:51:05 -03:00 |
|
rafaelsideguide
|
e37d151404
|
added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
|
2024-06-12 15:06:47 -03:00 |
|
rafaelsideguide
|
dc6acbf1f0
|
Merge remote-tracking branch 'origin/main' into feat/allowbackwardcrawling-option
|
2024-06-12 11:01:05 -03:00 |
|
Nicolas
|
1e3e06a1d5
|
Update replacePaths.test.ts
|
2024-06-11 13:02:39 -07:00 |
|
Nicolas
|
2239e03269
|
Update replacePaths.test.ts
|
2024-06-11 12:54:02 -07:00 |
|
Nicolas
|
520739c9f4
|
Nick: fixed bugs associated with absolute path replacements
|
2024-06-11 12:43:16 -07:00 |
|
rafaelsideguide
|
ee282c3d55
|
Added allowBackwardCrawling option
|
2024-06-11 15:24:39 -03:00 |
|
Nicolas
|
f6b06ac27a
|
Nick: ignoreSitemap, better crawling algo
|
2024-06-10 18:12:41 -07:00 |
|
Nicolas
|
1bd0327e1a
|
Merge branch 'main' into nsc/pageoptions-crawler
|
2024-06-10 17:15:10 -07:00 |
|
Nicolas
|
7ae9778642
|
Update single_url.ts
|
2024-06-10 16:57:31 -07:00 |
|
Nicolas
|
913c1dd568
|
Nick: fetch -> axios and fix timeouts
|
2024-06-10 16:49:03 -07:00 |
|
Nicolas
|
3091f0134c
|
Nick:
|
2024-06-10 16:27:10 -07:00 |
|
rafaelsideguide
|
164676c70a
|
bugfix screenshot for readme pages
|
2024-06-05 15:34:42 -03:00 |
|
Nicolas
|
b4c6819a54
|
Nick:
|
2024-06-05 11:11:09 -07:00 |
|
rafaelsideguide
|
0d51b11dcd
|
missing breaks
|
2024-06-05 15:02:28 -03:00 |
|
Nicolas
|
7cb14edec8
|
Nick:
|
2024-06-05 10:13:52 -07:00 |
|