rafaelmmiller
|
f07bbef78e
|
added trycatch and removed redundancy
|
2024-11-05 08:11:49 -03:00 |
|
Rafael Miller
|
8297e5beef
|
Merge branch 'main' into fix/remove-base64-images
|
2024-11-04 10:35:09 -03:00 |
|
rafaelmmiller
|
4c5bb21a6f
|
added remove base64 images options (true by default)
|
2024-11-04 10:31:44 -03:00 |
|
Eric Ciarla
|
2eff27ba43
|
Merge pull request #847 from mendableai/nsc/mobile-support
Adds support for mobile web scraping + mobile screenshot
|
2024-11-02 11:16:33 -04:00 |
|
Nicolas
|
446acfccde
|
Nick: support for the new actions
|
2024-10-31 20:01:52 -03:00 |
|
rafaelsideguide
|
367af9512f
|
added iframe links to extractLinksFromHTML
|
2024-10-31 10:53:47 -03:00 |
|
Thomas Kosmas
|
fe02101a12
|
Iframe support
|
2024-10-31 14:40:33 +02:00 |
|
Nicolas
|
c00cd21308
|
Nick: adds support for mobile web scraping
|
2024-10-29 14:10:40 -03:00 |
|
Nicolas
|
fa8875d64d
|
Update single_url.ts
|
2024-10-28 15:09:50 -03:00 |
|
Thomas Kosmas
|
acde353e56
|
skipTlsVerification on robots.txt scraping
|
2024-10-23 01:07:03 +03:00 |
|
Thomas Kosmas
|
bd55464b52
|
skipTlsVerification
|
2024-10-22 22:28:02 +03:00 |
|
Nicolas
|
a73b06589c
|
Merge pull request #785 from mendableai/nsc/support-for-all-metadata
Return all the website metadata
|
2024-10-16 23:37:26 -03:00 |
|
Nicolas
|
2ac50a16f5
|
Update metadata.ts
|
2024-10-16 23:37:07 -03:00 |
|
Nicolas
|
8974230db4
|
Nick: formatting + error handling
|
2024-10-16 23:35:03 -03:00 |
|
Nicolas
|
417c7697c3
|
Update metadata.ts
|
2024-10-16 23:26:46 -03:00 |
|
Nicolas
|
ff906f7750
|
Update excludeTags.ts
|
2024-10-16 13:40:34 -03:00 |
|
Nicolas
|
2c1a98f019
|
Update excludeTags.ts
|
2024-10-16 13:37:40 -03:00 |
|
Nicolas
|
027158fa44
|
Nick:
|
2024-10-15 21:47:27 -03:00 |
|
Nicolas
|
795e5a9228
|
Update metadata.ts
|
2024-10-15 21:36:13 -03:00 |
|
Nicolas
|
b4f6a0f919
|
Nick: geolocation
|
2024-10-15 21:12:33 -03:00 |
|
rafaelsideguide
|
180801225b
|
fix/check files on crawl
|
2024-10-14 15:44:45 -03:00 |
|
rafaelsideguide
|
2bf7b433e2
|
fixed file blocking process
|
2024-10-14 12:18:26 -03:00 |
|
rafaelsideguide
|
c1f98d0371
|
fixed developer.notion special case
|
2024-10-11 10:54:59 -03:00 |
|
Nicolas
|
abb5ec7439
|
Update playwright.ts
|
2024-10-09 22:55:01 -03:00 |
|
Nicolas
|
f6ec45f046
|
Merge pull request #747 from Harsh0707005/timeout-parameter-not-passed
Fixed Issue #734
|
2024-10-09 22:53:26 -03:00 |
|
Nicolas
|
222a34cae8
|
Update playwright.ts
|
2024-10-09 22:53:03 -03:00 |
|
Nicolas
|
064ce482c2
|
Update blocklist.ts
|
2024-10-09 14:41:23 -03:00 |
|
Harsh Master
|
aa3d4b8d6c
|
Fixed Issue #734
|
2024-10-08 11:36:12 +05:30 |
|
Nicolas
|
5c0c952a27
|
Update website_params.ts
|
2024-10-07 14:51:05 -03:00 |
|
Nicolas
|
dba96998e3
|
Update fetch.ts
|
2024-10-03 18:56:51 -03:00 |
|
Nicolas
|
668ff3c71b
|
Update fetch.ts
|
2024-10-03 18:55:39 -03:00 |
|
Nicolas
|
25dd16bf2a
|
Nick: removed 401
|
2024-10-03 18:52:17 -03:00 |
|
Nicolas
|
ddd774ed68
|
Nick:
|
2024-10-03 17:20:57 -03:00 |
|
Nicolas
|
a150aa820c
|
Nick: shouldnt fallback on a 400 + error code should be correct on page status code
|
2024-10-03 15:21:42 -03:00 |
|
Nicolas
|
ac5e1fc194
|
Update sitemap.ts
|
2024-10-01 16:14:43 -03:00 |
|
Nicolas
|
8aa07afb6d
|
Nick: fixes
|
2024-10-01 15:15:49 -03:00 |
|
Nicolas
|
ff4b7a835b
|
Merge pull request #685 from devflowinc/main
bugfix: using onlyIncludeTags and removeTags together
|
2024-09-30 17:18:30 -03:00 |
|
Nicolas
|
975f0575b4
|
Nick: max retries with axios-retry
|
2024-09-27 12:58:57 -04:00 |
|
Nicolas
|
1fdff87b3e
|
Update single_url.ts
|
2024-09-27 12:23:44 -04:00 |
|
Nicolas
|
a9773a24a3
|
Nick: increased timeout for chrome-cdp due to smart wait
|
2024-09-25 19:27:02 -04:00 |
|
Nicolas
|
1da026b26e
|
Update single_url.ts
|
2024-09-24 23:29:48 -04:00 |
|
Nicolas
|
b8266cc329
|
Update website_params.ts
|
2024-09-24 23:28:58 -04:00 |
|
Nicolas
|
3f138e559e
|
Update website_params.ts
|
2024-09-24 15:14:26 -04:00 |
|
Gergő Móricz
|
43730b5db6
|
feat(WebScraper): always report error of last scraper in order
|
2024-09-24 20:03:49 +02:00 |
|
Gergő Móricz
|
4194525640
|
fix(blocklist): unblock TikTok Business page
This is just a regular business site, not social media.
|
2024-09-24 16:55:19 +02:00 |
|
Gergő Móricz
|
a59b5836d5
|
Revert error tallying
|
2024-09-24 10:27:49 +02:00 |
|
Gergő Móricz
|
677faa27f3
|
fix(WebScraper): explicitly ignore 404s
|
2024-09-23 18:47:07 +02:00 |
|
Gergő Móricz
|
d2f7031069
|
fix(WebScraper): fatal error handler triggering for 404s
|
2024-09-23 18:33:10 +02:00 |
|
Nicolas
|
dfdbae74c6
|
Update fireEngine.ts
|
2024-09-21 21:10:05 -04:00 |
|
Nicolas
|
0690cfeaad
|
Merge branch 'main' into feat/actions
|
2024-09-20 18:24:13 -04:00 |
|