Gergo Moricz
|
7bb922071c
|
fix(queue-worker): manually renew lock (testing)
|
2024-08-07 14:35:20 +02:00 |
|
Nicolas
|
3321ca9398
|
Merge pull request #504 from mendableai/feat/fullpage-screenshot
[Feat] Added fullpagescreenshot capabilities
|
2024-08-06 13:52:29 -04:00 |
|
Gergo Moricz
|
b60ee30dba
|
fix(single_url): accept 500
|
2024-08-06 18:00:56 +02:00 |
|
rafaelsideguide
|
4d24a99d50
|
fix params
|
2024-08-06 09:34:43 -03:00 |
|
rafaelsideguide
|
3edc3a3d15
|
added fullpagescreenshot capabilities, wip on fire-engine side
|
2024-08-05 18:17:37 -03:00 |
|
rafaelsideguide
|
f32e8de156
|
fixes the empty excludes.filter undefined bug
|
2024-08-05 18:13:31 -03:00 |
|
Nicolas
|
1742e4ceae
|
Nick:
|
2024-08-02 19:25:15 -04:00 |
|
Nicolas
|
b448e3c3ad
|
Update website_params.ts
|
2024-08-02 14:26:35 -04:00 |
|
rafaelsideguide
|
4051630632
|
Update sitemap.ts
|
2024-08-02 11:32:48 -03:00 |
|
rafaelsideguide
|
8568b61015
|
bugfix for sitemaps
|
2024-08-02 11:03:01 -03:00 |
|
Nicolas
|
af68b7a785
|
Merge pull request #475 from mendableai/bugfix/issue-466
[Bug] pdfs and logging pdf events, also added trycatchs for docx
|
2024-08-01 22:05:26 -04:00 |
|
rafaelsideguide
|
f48ff36b32
|
added .inc files and forced lower case comparison
|
2024-07-31 09:28:43 -03:00 |
|
Nicolas
|
ad6f6eff4b
|
Update fireEngine.ts
|
2024-07-30 19:15:54 -04:00 |
|
Nicolas
|
6d99dedd3c
|
Nick: fixed tests
|
2024-07-30 19:11:01 -04:00 |
|
rafaelsideguide
|
d25d7e7244
|
special case: developer.apple.com
|
2024-07-30 10:13:09 -03:00 |
|
Nicolas
|
5e8ffcf505
|
Update website_params.ts
|
2024-07-29 20:43:47 -04:00 |
|
Nicolas
|
7b813883ef
|
Nick: first layer
|
2024-07-29 20:31:51 -04:00 |
|
Nicolas
|
968a2dc753
|
Nick:
|
2024-07-29 18:37:09 -04:00 |
|
rafaelsideguide
|
49e3e64787
|
bugfix for pdfs and logging pdf events, also added trycatchs for docx
|
2024-07-29 14:13:46 -03:00 |
|
Nicolas
|
4c9d62f6d3
|
Nick: fixing sitemap fallback
|
2024-07-26 18:25:44 -04:00 |
|
Nicolas
|
cb97871ff9
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2024-07-26 17:21:11 -04:00 |
|
Nicolas
|
ff4266f09e
|
Update pdfProcessor.ts
|
2024-07-26 17:21:09 -04:00 |
|
rafaelsideguide
|
96cec2a673
|
fix checking scrape log success content length
|
2024-07-26 12:00:52 -03:00 |
|
Nicolas
|
f82ca3be17
|
Nick:
|
2024-07-25 19:53:29 -04:00 |
|
Nicolas
|
01fab6e036
|
Update single_url.ts
|
2024-07-25 17:51:41 -04:00 |
|
Nicolas
|
56042d090c
|
Update single_url.ts
|
2024-07-25 17:48:44 -04:00 |
|
Nicolas
|
3242872503
|
Update single_url.ts
|
2024-07-25 17:43:55 -04:00 |
|
Nicolas
|
e5b797549e
|
Merge branch 'main' into feat/scrape-monitoring
|
2024-07-25 16:21:02 -04:00 |
|
rafaelsideguide
|
e720e1bacf
|
Merge remote-tracking branch 'origin/main' into feat/logger
|
2024-07-25 09:49:27 -03:00 |
|
rafaelsideguide
|
309728a482
|
updated logs
|
2024-07-25 09:48:06 -03:00 |
|
Nicolas
|
2c1221750b
|
Merge pull request #449 from mendableai/bugfix/malformed-url-sitemap
Added regex for links in sitemap
|
2024-07-24 20:37:35 -04:00 |
|
Nicolas
|
3a1b8a9797
|
Update website_params.ts
|
2024-07-24 11:04:47 -04:00 |
|
Nicolas
|
8b48ec8d30
|
Update website_params.ts
|
2024-07-24 11:02:20 -04:00 |
|
Gergo Moricz
|
4d35ad073c
|
feat(monitoring/scrape): include url, worker, response_size
|
2024-07-24 16:43:39 +02:00 |
|
Gergo Moricz
|
64bcedeefc
|
fix(monitoring): bad success check on scrape
|
2024-07-24 16:21:59 +02:00 |
|
Gergo Moricz
|
7cd9bf92e3
|
feat: scrape event logging to DB
|
2024-07-24 14:31:25 +02:00 |
|
Rafael Miller
|
5e728c1a4d
|
Update apps/api/src/scraper/WebScraper/crawler.ts
no need for regex
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
|
2024-07-24 08:33:00 -03:00 |
|
rafaelsideguide
|
6208ecdbc0
|
added logger
|
2024-07-23 17:30:46 -03:00 |
|
Nicolas
|
f0b07b509b
|
Update index.ts
|
2024-07-23 15:15:56 -04:00 |
|
rafaelsideguide
|
a684bd3c5d
|
added regex for links in sitemap
|
2024-07-23 09:07:23 -03:00 |
|
Nicolas
|
8916fec66c
|
Update index.ts
|
2024-07-22 19:14:53 -04:00 |
|
Nicolas
|
e31a5007d5
|
Nick: speed improvements
|
2024-07-22 18:30:58 -04:00 |
|
rafaelsideguide
|
5c02dbe20c
|
fix(isFile): added .tiff extension
|
2024-07-18 17:07:21 -03:00 |
|
Gergo Moricz
|
f0e95ce399
|
fix(WebCrawler): filter out file URLs when taking URLs from sitemap
|
2024-07-18 21:49:37 +02:00 |
|
Nicolas
|
5f14f4f788
|
Update blocklist.ts
|
2024-07-18 14:20:19 -04:00 |
|
Nicolas
|
f10f3f886b
|
Merge pull request #410 from mendableai/feat/fire-engine-chrome-cdp
Support chrome-cdp and restructure sitemap fire-engine support.
|
2024-07-18 13:52:08 -04:00 |
|
Nicolas
|
d2de01d342
|
Nick: fixes
|
2024-07-18 13:19:44 -04:00 |
|
Gergo Moricz
|
0b8047c7a0
|
fix(WebScraper): infinite regex leading to fly.io instance hangs
|
2024-07-18 19:13:43 +02:00 |
|
Nicolas
|
f11137352c
|
Merge branch 'main' into feat/fire-engine-chrome-cdp
|
2024-07-18 12:48:42 -04:00 |
|
Caleb Peffer
|
8d5ebc9b9f
|
Merge pull request #423 from mendableai/cjp/linksOnPage
Caleb: Return a list of links on a page by default
|
2024-07-17 12:36:07 -06:00 |
|