Eric Ciarla
|
71c98d8b80
|
Update logic
|
2024-06-13 18:00:52 -04:00 |
|
Eric Ciarla
|
095951aa4d
|
Update test
|
2024-06-13 17:40:00 -04:00 |
|
Eric Ciarla
|
5e8aa92788
|
Update index.ts
|
2024-06-13 17:33:13 -04:00 |
|
Eric Ciarla
|
65d63bae45
|
Update index.ts
|
2024-06-13 17:17:44 -04:00 |
|
Eric Ciarla
|
32e814bedc
|
Update index.ts
|
2024-06-13 17:02:30 -04:00 |
|
rafaelsideguide
|
bb859ae9a7
|
Added metadata.pageStatusCode and metadata.pageError properties to the responses
|
2024-06-13 17:08:40 -03:00 |
|
rafaelsideguide
|
676d6e8ab5
|
Added pageOptions.removeTags
|
2024-06-13 10:51:05 -03:00 |
|
rafaelsideguide
|
e37d151404
|
added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
|
2024-06-12 15:06:47 -03:00 |
|
rafaelsideguide
|
dc6acbf1f0
|
Merge remote-tracking branch 'origin/main' into feat/allowbackwardcrawling-option
|
2024-06-12 11:01:05 -03:00 |
|
Nicolas
|
1e3e06a1d5
|
Update replacePaths.test.ts
|
2024-06-11 13:02:39 -07:00 |
|
Nicolas
|
2239e03269
|
Update replacePaths.test.ts
|
2024-06-11 12:54:02 -07:00 |
|
Nicolas
|
520739c9f4
|
Nick: fixed bugs associated with absolute path replacements
|
2024-06-11 12:43:16 -07:00 |
|
rafaelsideguide
|
ee282c3d55
|
Added allowBackwardCrawling option
|
2024-06-11 15:24:39 -03:00 |
|
Nicolas
|
f6b06ac27a
|
Nick: ignoreSitemap, better crawling algo
|
2024-06-10 18:12:41 -07:00 |
|
Nicolas
|
1bd0327e1a
|
Merge branch 'main' into nsc/pageoptions-crawler
|
2024-06-10 17:15:10 -07:00 |
|
Nicolas
|
7ae9778642
|
Update single_url.ts
|
2024-06-10 16:57:31 -07:00 |
|
Nicolas
|
913c1dd568
|
Nick: fetch -> axios and fix timeouts
|
2024-06-10 16:49:03 -07:00 |
|
Nicolas
|
3091f0134c
|
Nick:
|
2024-06-10 16:27:10 -07:00 |
|
rafaelsideguide
|
164676c70a
|
bugfix screenshot for readme pages
|
2024-06-05 15:34:42 -03:00 |
|
Nicolas
|
b4c6819a54
|
Nick:
|
2024-06-05 11:11:09 -07:00 |
|
rafaelsideguide
|
0d51b11dcd
|
missing breaks
|
2024-06-05 15:02:28 -03:00 |
|
Nicolas
|
7cb14edec8
|
Nick:
|
2024-06-05 10:13:52 -07:00 |
|
Rafael Miller
|
9e000ded03
|
Merge branch 'main' into feat/better-gdrive-pdf-fetch
|
2024-06-05 14:07:56 -03:00 |
|
rafaelsideguide
|
ccc55127d6
|
Added scroll xpaths on fire-engine for handling readme docs
|
2024-06-05 11:48:41 -03:00 |
|
rafaelsideguide
|
b5045d1661
|
[feat] improved the scrape for gdrive pdfs
|
2024-06-04 17:47:28 -03:00 |
|
Nicolas
|
96257b7b17
|
Update handleCustomScraping.ts
|
2024-06-04 12:22:46 -07:00 |
|
Nicolas
|
674500affa
|
Nick:
|
2024-06-04 12:15:39 -07:00 |
|
rafaelsideguide
|
5ae4d1caf5
|
Update single_url.ts
|
2024-06-04 15:28:09 -03:00 |
|
rafaelsideguide
|
64a4338ff0
|
Update single_url.ts
|
2024-06-04 14:40:05 -03:00 |
|
Rafael Miller
|
02fe470e20
|
Merge pull request #148 from mendableai/nsc/improvemnts-fixes-misc
Better fallbacks for initial crawl start
|
2024-06-04 14:31:10 -03:00 |
|
Rafael Miller
|
b80fb374e5
|
Merge branch 'main' into playwright-service-bug-222
|
2024-06-04 11:57:17 -03:00 |
|
rafaelsideguide
|
6920ec8a61
|
bugfixing. already on main
|
2024-06-04 11:05:50 -03:00 |
|
Nicolas
|
cbf8d79cce
|
Update pdfProcessor.ts
|
2024-06-04 00:13:37 -07:00 |
|
Nicolas
|
2ea01f1456
|
Update single_url.ts
|
2024-06-03 23:42:39 -07:00 |
|
Nicolas
|
854d5b3cb3
|
Update single_url.ts
|
2024-06-03 23:32:55 -07:00 |
|
Nicolas
|
99059814a8
|
Nick:
|
2024-06-03 21:32:48 -07:00 |
|
Nicolas
|
918059ee9e
|
Merge branch 'main' into nsc/improvemnts-fixes-misc
|
2024-06-03 16:46:02 -07:00 |
|
Nicolas
|
38e583f66c
|
Update socialBlockList.test.ts
|
2024-06-03 16:44:23 -07:00 |
|
Nicolas
|
c69c89f838
|
Nick:
|
2024-06-03 16:42:42 -07:00 |
|
Nicolas
|
48d1ec05b2
|
Merge branch 'main' into nsc/improved-blocklist
|
2024-06-03 16:38:03 -07:00 |
|
Nicolas
|
d30ced4394
|
Merge pull request #221 from mendableai/nsc/fwd-header-auth
feat: Ability to forward headers to reliable providers for auth etc...
|
2024-06-03 16:33:40 -07:00 |
|
rafaelsideguide
|
1fc3a15149
|
Update single_url.ts
|
2024-06-03 15:24:40 -03:00 |
|
Nicolas
|
fde522c3e1
|
Update single_url.ts
|
2024-06-02 20:23:45 -07:00 |
|
Matt Joyce
|
deefe65cbe
|
Change the way the playwright response is parsed
Was failing with a Type Error, but actually looked ok.
This fixes the type error, and stop scraper fallback.
|
2024-06-01 19:16:56 +10:00 |
|
Nicolas
|
8cb62dde92
|
Update website_params.ts
|
2024-05-31 16:09:39 -07:00 |
|
Nicolas
|
3b8059edb6
|
Update single_url.ts
|
2024-05-31 15:43:06 -07:00 |
|
Nicolas
|
6bea803120
|
Nick:
|
2024-05-31 15:39:54 -07:00 |
|
Nicolas
|
6c939d534d
|
Nick: small refactor
|
2024-05-29 19:43:51 -07:00 |
|
Eric Ciarla
|
37915e11e8
|
Final push
|
2024-05-29 21:18:24 -04:00 |
|
Eric Ciarla
|
a0e404f94e
|
init commit
|
2024-05-29 18:56:57 -04:00 |
|