Nicolas
|
86e34d7c6c
|
Nick: wip
|
2025-01-07 12:13:12 -03:00 |
|
Móricz Gergő
|
7a03275575
|
add comment
|
2025-01-07 13:57:47 +01:00 |
|
Móricz Gergő
|
7d73ebdbf1
|
fix(crawl): never invalidate first crawl scrape if redirects
|
2025-01-07 13:57:23 +01:00 |
|
Móricz Gergő
|
b96b97ed72
|
fix(crawl): don't push rawhtml to db unless requested
|
2025-01-07 10:09:15 +01:00 |
|
Móricz Gergő
|
35d1d85978
|
fix(crawler): also take the hostname of the base url when determining isInternalLink
|
2025-01-07 09:29:58 +01:00 |
|
Nicolas
|
bb27594443
|
Merge branch 'main' into nsc/extract-queue
|
2025-01-06 13:01:15 -03:00 |
|
Gergő Móricz
|
461842fe8c
|
fix(v1/crawl-status): handle job's returnvalue being explicitly null (db race)
|
2025-01-04 17:24:33 +01:00 |
|
Gergő Móricz
|
b92a4eb79b
|
fix(queue-worker): only do redirect handling logic on crawls, not batch scrape
|
2025-01-04 16:59:35 +01:00 |
|
Nicolas
|
d48ddb8820
|
Update canonical-url.test.ts
|
2025-01-03 23:55:05 -03:00 |
|
Nicolas
|
f2e0bfbfe3
|
Nick: url normalization
|
2025-01-03 23:54:03 -03:00 |
|
Nicolas
|
f25c0c6d21
|
Nick: added canonical tests
|
2025-01-03 23:16:33 -03:00 |
|
Nicolas
|
aef040b41e
|
Nick: from cache fixes
|
2025-01-03 23:07:15 -03:00 |
|
Nicolas
|
e8a9d8ddcd
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2025-01-03 22:55:42 -03:00 |
|
Nicolas
|
05e845a971
|
Update cache.ts
|
2025-01-03 22:55:38 -03:00 |
|
Nicolas
|
c655c6859f
|
Nick: fixed
|
2025-01-03 22:50:53 -03:00 |
|
Nicolas
|
a4f7c38834
|
Nick: fixed
|
2025-01-03 22:15:23 -03:00 |
|
Nicolas
|
8df1c67961
|
Update queue-worker.ts
|
2025-01-03 21:48:28 -03:00 |
|
Nicolas
|
499479c85e
|
Update url-processor.ts
|
2025-01-03 21:28:52 -03:00 |
|
Nicolas
|
432b410678
|
Update queue-worker.ts
|
2025-01-03 21:26:05 -03:00 |
|
Nicolas
|
6b2e1cbb28
|
Nick: cache /extract scrapes
|
2025-01-03 21:19:40 -03:00 |
|
Nicolas
|
27457ed5db
|
Nick: init
|
2025-01-03 20:44:27 -03:00 |
|
Nicolas
|
81cf05885b
|
Merge branch 'main' into nsc/semantic-index-extract
|
2025-01-03 19:57:29 -03:00 |
|
Nicolas
|
ad49503f8a
|
Update search.ts
|
2025-01-02 21:15:47 -03:00 |
|
Nicolas
|
cbe0716439
|
Update search.ts
|
2025-01-02 21:13:24 -03:00 |
|
Nicolas
|
e37ab8431a
|
Update search.ts
|
2025-01-02 21:07:14 -03:00 |
|
Nicolas
|
8b64e915b3
|
Update search.ts
|
2025-01-02 21:02:55 -03:00 |
|
Nicolas
|
7ce780ac81
|
Update search.ts
|
2025-01-02 20:40:38 -03:00 |
|
Nicolas
|
21bf89b6cc
|
Update search.ts
|
2025-01-02 19:57:51 -03:00 |
|
Nicolas
|
22ae1730bd
|
Update search.ts
|
2025-01-02 19:57:41 -03:00 |
|
Nicolas
|
a0dbf20c40
|
Update types.ts
|
2025-01-02 19:55:28 -03:00 |
|
Nicolas
|
35d7202894
|
Update search.ts
|
2025-01-02 19:33:21 -03:00 |
|
Nicolas
|
d2742bec4d
|
Nick: v1 search
|
2025-01-02 19:31:03 -03:00 |
|
rafaelmmiller
|
ef0fc8d0d3
|
broader search if didnt find results
|
2025-01-02 18:00:18 -03:00 |
|
Nicolas
|
c9d91af86f
|
Merge branch 'main' into nsc/semantic-index-extract
|
2025-01-02 15:26:40 -03:00 |
|
Nicolas
|
c3fd13a82b
|
Nick: fixed re-ranker and enabled url cache of 2hrs
|
2024-12-31 18:06:07 -03:00 |
|
Nicolas
|
07f4b714af
|
Update removeUnwantedElements.ts
|
2024-12-31 15:23:02 -03:00 |
|
Nicolas
|
33632d2fe3
|
Update extraction-service.ts
|
2024-12-31 15:22:50 -03:00 |
|
Nicolas
|
bd81b41d5f
|
Update queue-worker.ts
|
2024-12-30 21:43:59 -03:00 |
|
Nicolas
|
e6da214aeb
|
Nick: async background index
|
2024-12-30 21:42:01 -03:00 |
|
Nicolas
|
7a31306be5
|
Nick: url normalization + max metadata size
|
2024-12-30 20:04:22 -03:00 |
|
Nicolas
|
bf9d41d0b2
|
Nick: index exploration
|
2024-12-30 19:37:48 -03:00 |
|
Nicolas
|
0847a6038e
|
Merge pull request #1014 from mendableai/nsc/extract-url-trace
/extract URL trace
|
2024-12-30 19:00:58 -03:00 |
|
Gergő Móricz
|
71a8f7452c
|
fix(WebScraper/sitemap): await urlsHandler to fix race condition
|
2024-12-30 16:09:22 +01:00 |
|
Nicolas
|
8ae34a0d31
|
Nick: rm .xml from isFile
|
2024-12-30 11:57:01 -03:00 |
|
Gergő Móricz
|
9005757de3
|
fix(queue-worker): do not follow redirect URLs if they are not allowed by the crawl options
|
2024-12-30 14:41:31 +01:00 |
|
Gergő Móricz
|
4d1f92f4c8
|
fix(scrapeURL/fetch): block loopback and link-local IPs
|
2024-12-29 17:35:14 +01:00 |
|
Nicolas
|
e255301005
|
Update index.ts
|
2024-12-27 21:31:29 -03:00 |
|
Nicolas
|
1eca61bffb
|
Update index.ts
|
2024-12-27 20:59:18 -03:00 |
|
Nicolas
|
f9d55efba8
|
Update index.ts
|
2024-12-27 20:54:26 -03:00 |
|
Nicolas
|
b8d7f9f257
|
Nick: we are using runpod
|
2024-12-27 19:59:05 -03:00 |
|