474 Commits

Author SHA1 Message Date
rafaelmmiller
f07bbef78e added trycatch and removed redundancy 2024-11-05 08:11:49 -03:00
Rafael Miller
8297e5beef
Merge branch 'main' into fix/remove-base64-images 2024-11-04 10:35:09 -03:00
rafaelmmiller
4c5bb21a6f added remove base64 images options (true by default) 2024-11-04 10:31:44 -03:00
Eric Ciarla
2eff27ba43
Merge pull request #847 from mendableai/nsc/mobile-support
Adds support for mobile web scraping + mobile screenshot
2024-11-02 11:16:33 -04:00
Nicolas
446acfccde Nick: support for the new actions 2024-10-31 20:01:52 -03:00
rafaelsideguide
367af9512f added iframe links to extractLinksFromHTML 2024-10-31 10:53:47 -03:00
Thomas Kosmas
fe02101a12 Iframe support 2024-10-31 14:40:33 +02:00
Nicolas
c00cd21308 Nick: adds support for mobile web scraping 2024-10-29 14:10:40 -03:00
Nicolas
fa8875d64d Update single_url.ts 2024-10-28 15:09:50 -03:00
Thomas Kosmas
acde353e56 skipTlsVerification on robots.txt scraping 2024-10-23 01:07:03 +03:00
Thomas Kosmas
bd55464b52 skipTlsVerification 2024-10-22 22:28:02 +03:00
Nicolas
a73b06589c
Merge pull request #785 from mendableai/nsc/support-for-all-metadata
Return all the website metadata
2024-10-16 23:37:26 -03:00
Nicolas
2ac50a16f5 Update metadata.ts 2024-10-16 23:37:07 -03:00
Nicolas
8974230db4 Nick: formatting + error handling 2024-10-16 23:35:03 -03:00
Nicolas
417c7697c3 Update metadata.ts 2024-10-16 23:26:46 -03:00
Nicolas
ff906f7750 Update excludeTags.ts 2024-10-16 13:40:34 -03:00
Nicolas
2c1a98f019 Update excludeTags.ts 2024-10-16 13:37:40 -03:00
Nicolas
027158fa44 Nick: 2024-10-15 21:47:27 -03:00
Nicolas
795e5a9228 Update metadata.ts 2024-10-15 21:36:13 -03:00
Nicolas
b4f6a0f919 Nick: geolocation 2024-10-15 21:12:33 -03:00
rafaelsideguide
180801225b fix/check files on crawl 2024-10-14 15:44:45 -03:00
rafaelsideguide
2bf7b433e2 fixed file blocking process 2024-10-14 12:18:26 -03:00
rafaelsideguide
c1f98d0371 fixed developer.notion special case 2024-10-11 10:54:59 -03:00
Nicolas
abb5ec7439 Update playwright.ts 2024-10-09 22:55:01 -03:00
Nicolas
f6ec45f046
Merge pull request #747 from Harsh0707005/timeout-parameter-not-passed
Fixed Issue #734
2024-10-09 22:53:26 -03:00
Nicolas
222a34cae8
Update playwright.ts 2024-10-09 22:53:03 -03:00
Nicolas
064ce482c2 Update blocklist.ts 2024-10-09 14:41:23 -03:00
Harsh Master
aa3d4b8d6c
Fixed Issue #734 2024-10-08 11:36:12 +05:30
Nicolas
5c0c952a27 Update website_params.ts 2024-10-07 14:51:05 -03:00
Nicolas
dba96998e3 Update fetch.ts 2024-10-03 18:56:51 -03:00
Nicolas
668ff3c71b Update fetch.ts 2024-10-03 18:55:39 -03:00
Nicolas
25dd16bf2a Nick: removed 401 2024-10-03 18:52:17 -03:00
Nicolas
ddd774ed68 Nick: 2024-10-03 17:20:57 -03:00
Nicolas
a150aa820c Nick: shouldnt fallback on a 400 + error code should be correct on page status code 2024-10-03 15:21:42 -03:00
Nicolas
ac5e1fc194 Update sitemap.ts 2024-10-01 16:14:43 -03:00
Nicolas
8aa07afb6d Nick: fixes 2024-10-01 15:15:49 -03:00
Nicolas
ff4b7a835b
Merge pull request #685 from devflowinc/main
bugfix: using onlyIncludeTags and removeTags together
2024-09-30 17:18:30 -03:00
Nicolas
975f0575b4 Nick: max retries with axios-retry 2024-09-27 12:58:57 -04:00
Nicolas
1fdff87b3e Update single_url.ts 2024-09-27 12:23:44 -04:00
Nicolas
a9773a24a3 Nick: increased timeout for chrome-cdp due to smart wait 2024-09-25 19:27:02 -04:00
Nicolas
1da026b26e Update single_url.ts 2024-09-24 23:29:48 -04:00
Nicolas
b8266cc329 Update website_params.ts 2024-09-24 23:28:58 -04:00
Nicolas
3f138e559e Update website_params.ts 2024-09-24 15:14:26 -04:00
Gergő Móricz
43730b5db6 feat(WebScraper): always report error of last scraper in order 2024-09-24 20:03:49 +02:00
Gergő Móricz
4194525640 fix(blocklist): unblock TikTok Business page
This is just a regular business site, not social media.
2024-09-24 16:55:19 +02:00
Gergő Móricz
a59b5836d5 Revert error tallying 2024-09-24 10:27:49 +02:00
Gergő Móricz
677faa27f3 fix(WebScraper): explicitly ignore 404s 2024-09-23 18:47:07 +02:00
Gergő Móricz
d2f7031069 fix(WebScraper): fatal error handler triggering for 404s 2024-09-23 18:33:10 +02:00
Nicolas
dfdbae74c6 Update fireEngine.ts 2024-09-21 21:10:05 -04:00
Nicolas
0690cfeaad Merge branch 'main' into feat/actions 2024-09-20 18:24:13 -04:00