69 Commits

Author SHA1 Message Date
Nicolas
c00cd21308 Nick: adds support for mobile web scraping 2024-10-29 14:10:40 -03:00
Thomas Kosmas
bd55464b52 skipTlsVerification 2024-10-22 22:28:02 +03:00
Nicolas
b4f6a0f919 Nick: geolocation 2024-10-15 21:12:33 -03:00
Nicolas
db161ac55a Nick: press + write 2024-09-20 19:45:23 -04:00
Gergő Móricz
d663bbf0ca feat(actions): add scroll 2024-09-20 21:41:53 +02:00
Gergő Móricz
3dd912ec91 feat(actions): add typeText, pressKey, fix playwright screenshot/waitFor 2024-09-20 21:02:53 +02:00
Gergő Móricz
093c064bff feat(v1): add public actions api 2024-09-18 20:39:25 +02:00
Gergő Móricz
42d677fe3c feat(fire-engine): port waitFor and screenshot to use actions 2024-09-18 20:04:54 +02:00
rafaelsideguide
8c1097e9e1 fix: pageOptions 2024-09-05 14:16:31 -03:00
Nicolas
41eb620959 Nick: prompt option, still need to convert to new structured outputs 2024-08-29 21:00:57 -03:00
Nicolas
49e1cb7ca0 Nick: 2024-08-29 20:08:06 -03:00
Gergő Móricz
e7f267b6fe Merge branch 'main' into v1-webscraper 2024-08-23 17:21:54 +02:00
rafaelsideguide
ecd472356b added variables to beta customers 2024-08-19 16:41:54 -03:00
rafaelsideguide
7a61325500 map + search + scrape markdown bug 2024-08-16 17:57:11 -03:00
rafaelsideguide
3f998b688d scrape ready 2024-08-16 15:14:37 -03:00
Gergő Móricz
29f0d9ec94 propagate priority to fire-engine 2024-08-15 19:04:46 +02:00
Nicolas
3321ca9398
Merge pull request #504 from mendableai/feat/fullpage-screenshot
[Feat] Added fullpagescreenshot capabilities
2024-08-06 13:52:29 -04:00
rafaelsideguide
3edc3a3d15 added fullpagescreenshot capabilities, wip on fire-engine side 2024-08-05 18:17:37 -03:00
rafaelsideguide
f32e8de156 fixes the empty excludes.filter undefined bug 2024-08-05 18:13:31 -03:00
Nicolas
4c9d62f6d3 Nick: fixing sitemap fallback 2024-07-26 18:25:44 -04:00
Gergo Moricz
7cd9bf92e3 feat: scrape event logging to DB 2024-07-24 14:31:25 +02:00
Caleb Peffer
d39d3be649 Caleb: now extracting and returning a list of all links on the page for a customer 2024-07-16 18:38:03 -07:00
Nicolas
e098e88ea7 Nick: 2024-07-12 22:02:08 -04:00
Rafael Miller
f0f449fe51
Merge pull request #336 from snippet/allow-external-content-links
[Proposal] new feature allowExternalContentLinks
2024-07-02 09:45:21 -03:00
Jeff Pereira
a5fb45988c new feature allowExternalContentLinks 2024-06-28 17:23:40 -07:00
Eric Ciarla
87b54488d3 update to includeRawHtml 2024-06-28 17:07:47 -04:00
Eric Ciarla
70fcf2ce03 init 2024-06-28 16:39:09 -04:00
Nicolas
1d4907acc9 Nick: 2024-06-26 21:02:58 -03:00
Rafael Miller
f9c7ca9388
Merge branch 'main' into feat/issue-266 2024-06-14 11:47:58 -03:00
Rafael Miller
3e2e76311c
Merge branch 'main' into feat/issue-205 2024-06-14 11:25:20 -03:00
rafaelsideguide
bb859ae9a7 Added metadata.pageStatusCode and metadata.pageError properties to the responses 2024-06-13 17:08:40 -03:00
rafaelsideguide
676d6e8ab5 Added pageOptions.removeTags 2024-06-13 10:51:05 -03:00
rafaelsideguide
e37d151404 added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
2024-06-12 15:06:47 -03:00
rafaelsideguide
dc6acbf1f0 Merge remote-tracking branch 'origin/main' into feat/allowbackwardcrawling-option 2024-06-12 11:01:05 -03:00
Nicolas
520739c9f4 Nick: fixed bugs associated with absolute path replacements 2024-06-11 12:43:16 -07:00
rafaelsideguide
ee282c3d55 Added allowBackwardCrawling option 2024-06-11 15:24:39 -03:00
Nicolas
f6b06ac27a Nick: ignoreSitemap, better crawling algo 2024-06-10 18:12:41 -07:00
Nicolas
b4c6819a54 Nick: 2024-06-05 11:11:09 -07:00
Nicolas
6bea803120 Nick: 2024-05-31 15:39:54 -07:00
Nicolas
6c939d534d Nick: small refactor 2024-05-29 19:43:51 -07:00
Eric Ciarla
a0e404f94e init commit 2024-05-29 18:56:57 -04:00
Nicolas
1b3547dcf2 Nick: 2024-05-28 12:56:24 -07:00
Nicolas
77a79b5a79 Nick: max num tokens for llm extract (for now) + slice the max 2024-05-20 17:07:38 -07:00
Nicolas
8a72cf556b Nick: 2024-05-13 21:10:58 -07:00
Nicolas
a96fc5b96d Nick: 4x speed 2024-05-13 20:45:11 -07:00
Nicolas
dcedb8d798 Merge branch 'main' into feat/max-depth 2024-05-07 10:20:49 -07:00
Nicolas
6505bf6bf2 Merge branch 'main' into feat/max-depth 2024-05-07 10:20:44 -07:00
Nicolas
bdbee963f7 Merge branch 'main' into nsc/cancel-job 2024-05-07 10:13:43 -07:00
rafaelsideguide
e1f52c538f nested includeHtml inside pageOptions 2024-05-07 13:40:24 -03:00
rafaelsideguide
83f3408634 Added max depth option 2024-05-07 11:06:26 -03:00