Caleb Peffer
|
0b3c0ede49
|
Added tests per @nicks request
|
2024-07-16 21:15:59 -07:00 |
|
Caleb Peffer
|
98c788ca7a
|
Caleb: added a test to ensure links on page exists and isn't zero on mendable
|
2024-07-16 21:13:52 -07:00 |
|
Caleb Peffer
|
d39d3be649
|
Caleb: now extracting and returning a list of all links on the page for a customer
|
2024-07-16 18:38:03 -07:00 |
|
Nicolas
|
949791049f
|
Nick:
|
2024-07-12 23:20:26 -04:00 |
|
Nicolas
|
d0c8d3ecde
|
Merge branch 'main' into nsc/sitemap-fix-fire-engine
|
2024-07-12 22:15:06 -04:00 |
|
Nicolas
|
a3b1703b68
|
Update fireEngine.ts
|
2024-07-12 22:15:00 -04:00 |
|
Nicolas
|
e098e88ea7
|
Nick:
|
2024-07-12 22:02:08 -04:00 |
|
Nicolas
|
5da03a8fbd
|
Update fireEngine.ts
|
2024-07-12 14:59:49 -04:00 |
|
rafaelsideguide
|
9ad06fdf56
|
added fire-engine fallback for getting sitemaps
|
2024-07-09 16:07:53 -03:00 |
|
rafaelsideguide
|
c2bba54b4f
|
Added veeva to special case params
|
2024-07-05 16:58:07 -03:00 |
|
rafaelsideguide
|
0ab6cef471
|
Merge remote-tracking branch 'origin/main' into dependabot/npm_and_yarn/apps/api/prod-deps-5b38a50718
|
2024-07-05 14:00:10 -03:00 |
|
rafaelsideguide
|
538dc63035
|
Fixing rate-limiter-flexible package version
Redis version <3.0.2 throws TS bug:
https://github.com/animir/node-rate-limiter-flexible/issues/228
|
2024-07-05 12:12:00 -03:00 |
|
Nicolas
|
32849b017f
|
Nick:
|
2024-07-03 20:18:11 -03:00 |
|
Nicolas
|
066d92f643
|
Update single_url.ts
|
2024-07-03 18:38:17 -03:00 |
|
Nicolas
|
f5b2fbd7e8
|
Nick: revision
|
2024-07-03 18:06:53 -03:00 |
|
Nicolas
|
2d30cc6117
|
Nick: comments
|
2024-07-03 18:01:54 -03:00 |
|
Nicolas
|
90c54c32fd
|
Nick: refactor
|
2024-07-03 18:01:17 -03:00 |
|
Nicolas
|
90cf799a3c
|
Update single_url.ts
|
2024-07-03 17:56:21 -03:00 |
|
Nicolas
|
b36406e465
|
Nick: log scrpaers
|
2024-07-03 17:28:53 -03:00 |
|
rafaelsideguide
|
0175152577
|
Fixed PDF match custom scraping
Now it's working for both `https://getgc.ai/privacy` and `https://prairie.cards/products/wood-designs` usecases.
|
2024-07-02 11:25:17 -03:00 |
|
rafaelsideguide
|
7b7154ba1e
|
bugfixed pageStatusCode
|
2024-07-02 10:51:35 -03:00 |
|
Rafael Miller
|
f0f449fe51
|
Merge pull request #336 from snippet/allow-external-content-links
[Proposal] new feature allowExternalContentLinks
|
2024-07-02 09:45:21 -03:00 |
|
Nicolas
|
42cd58a679
|
Merge pull request #332 from mendableai/feat/rawHtmlExtraction
Adds pageOptions.includeRawHtml and new extraction mode "llm-extraction-from-raw-html"
|
2024-07-01 18:23:26 -03:00 |
|
rafaelsideguide
|
16aac7f8c5
|
Update single_url.ts
|
2024-07-01 18:21:15 -03:00 |
|
Nicolas
|
6d0c7a9ccd
|
Merge pull request #323 from mendableai/tests/crawl-limit-unit-tests
[Tests] Added crawl limit unit test
|
2024-07-01 17:56:04 -03:00 |
|
rafaelsideguide
|
4d6e25619b
|
minor spacing and comment stuff
|
2024-07-01 16:05:34 -03:00 |
|
Jeff Pereira
|
a5fb45988c
|
new feature allowExternalContentLinks
|
2024-06-28 17:23:40 -07:00 |
|
Eric Ciarla
|
87b54488d3
|
update to includeRawHtml
|
2024-06-28 17:07:47 -04:00 |
|
Eric Ciarla
|
70fcf2ce03
|
init
|
2024-06-28 16:39:09 -04:00 |
|
Nicolas
|
9bf74bc774
|
Update single_url.ts
|
2024-06-28 15:51:18 -03:00 |
|
Nicolas
|
7e17498bcf
|
Update single_url.ts
|
2024-06-28 15:45:16 -03:00 |
|
Nicolas
|
042f81ddf2
|
Update removeUnwantedElements.test.ts
|
2024-06-26 21:20:11 -03:00 |
|
Nicolas
|
388ce3cbce
|
Nick: small changes
|
2024-06-26 21:15:42 -03:00 |
|
Nicolas
|
1d4907acc9
|
Nick:
|
2024-06-26 21:02:58 -03:00 |
|
rafaelsideguide
|
009df6c930
|
Added crawl limit unit test
I think this test is over relying on mocks but I have no idea on how to fix this without changing the code arch structure
|
2024-06-26 09:54:25 -03:00 |
|
rafaelsideguide
|
5f69fc7677
|
Fixed the regex test
|
2024-06-25 18:24:01 -03:00 |
|
Nicolas
|
e7be17db92
|
Nick: metadata fixes and lock duration for bull decreased to 2 hrs
|
2024-06-25 15:21:14 -03:00 |
|
Nicolas
|
90b7fff366
|
Update crawler.ts
|
2024-06-24 16:52:01 -03:00 |
|
rafaelsideguide
|
3ebdf93342
|
removed console.logs
|
2024-06-24 16:43:12 -03:00 |
|
Nicolas
|
56d42d9c9b
|
Nick:
|
2024-06-24 16:33:07 -03:00 |
|
rafaelsideguide
|
21d29de819
|
testing crawl with new.abb.com case
many unnecessary console.logs for tracing the code execution
|
2024-06-24 16:25:07 -03:00 |
|
rafaelsideguide
|
9c539e9113
|
Fixed includeHTML to use cleanedHtml as response
|
2024-06-18 16:26:54 -03:00 |
|
Rafael Miller
|
f5a9acc4c6
|
Merge branch 'main' into feat/removeTags-regex
|
2024-06-18 14:39:59 -03:00 |
|
rafaelsideguide
|
9f7afd1e88
|
fix for some complex cases
|
2024-06-18 14:36:51 -03:00 |
|
rafaelsideguide
|
6c726a02eb
|
Moved to utils/removeUnwantedElements, added unit tests
|
2024-06-18 09:46:42 -03:00 |
|
AndyMik90
|
8b3c3aae91
|
Added support for RegEx in removeTags
|
2024-06-18 07:31:46 +02:00 |
|
rafaelsideguide
|
b2bd562bb2
|
transcribed from e2e to unit tests for many cases
|
2024-06-17 17:09:44 -03:00 |
|
Eric Ciarla
|
519ab1aecb
|
Update unit tests
|
2024-06-15 17:14:09 -04:00 |
|
Eric Ciarla
|
b1eb608295
|
Merge branch 'main' into feat/maxDepthRelative
|
2024-06-15 16:50:27 -04:00 |
|
Eric Ciarla
|
34e37c5671
|
Add unit tests to replace e2e
|
2024-06-15 16:43:37 -04:00 |
|