adikalra
acde4ed173
Add custom chunking function.
2025-01-09 17:20:24 +05:30
zrguo
b93203804c
Merge branch 'main' into main
2025-01-09 15:28:57 +08:00
zrguo
92ccfa2770
Merge pull request #555 from ParisNeo/main
...
Restore backwards compatibility for LightRAG's ainsert method
2025-01-09 15:27:09 +08:00
童石渊
dd213c95be
增加仅字符分割参数,如果开启,仅采用字符分割,不开启,在分割完以后如果chunk过大,会继续根据token size分割,更新测试文件
2025-01-09 11:55:49 +08:00
Saifeddine ALOUI
65c1450c66
fixed retro compatibility with ainsert by making split_by_character get a None default value
2025-01-08 20:50:22 +01:00
Gurjot Singh
9565a4663a
Fix trailing whitespace and formatting issues in lightrag.py
2025-01-09 00:39:22 +05:30
Gurjot Singh
a940251390
Implement custom chunking feature
2025-01-07 20:57:39 +05:30
童石渊
6b19401dc6
chunk split retry
2025-01-07 16:26:12 +08:00
童石渊
536d6f2283
添加字符分割功能,在“insert”函数中如果增加参数split_by_character,则会按照split_by_character进行字符分割,此时如果每个分割后的chunk的tokens大于max_token_size,则会继续按token_size分割(todo:考虑字符分割后过短的chunk处理)
2025-01-07 00:28:15 +08:00
zrguo
990b684a85
Update lightrag.py
2025-01-06 15:27:31 +08:00
Samuel Chan
6ae27d8f06
Some enhancements:
...
- Enable the llm_cache storage to support get_by_mode_and_id, to improve the performance for using real KV server
- Provide an option for the developers to cache the LLM response when extracting entities for a document. Solving the paint point that sometimes the process failed, the processed chunks we need to call LLM again, money and time wasted. With the new option (by default not enabled) enabling, we can cache that result, can significantly save the time and money for beginners.
2025-01-06 12:50:05 +08:00
Samuel Chan
60e8a355f0
Merge branch 'HKUDS:main' into main
2025-01-03 21:18:17 +08:00
Samuel Chan
b17cb2aa95
With a draft for progres_impl
2025-01-01 22:43:59 +08:00
zrguo
d489d9dec0
fix linting errors
2024-12-31 17:32:04 +08:00
zrguo
cee5b2fbb0
add delete by doc id
2024-12-31 17:15:57 +08:00
Magic_yuan
aaaf617451
feat(lightrag): Implement mix search mode combining knowledge graph and vector retrieval
...
- Add 'mix' mode to QueryParam for hybrid search functionality
- Implement mix_kg_vector_query to combine knowledge graph and vector search results
- Update LightRAG class to handle 'mix' mode queries
- Enhance README with examples and explanations for the new mix search mode
- Introduce new prompt structure for generating responses based on combined search results
2024-12-28 11:56:28 +08:00
Magic_yuan
650b8e38b7
feat(lightrag): Add document status tracking and checkpoint support
...
功能(lightrag): 添加文档状态跟踪和断点续传支持
- Add DocStatus enum and DocProcessingStatus class for document processing state management
- 添加 DocStatus 枚举和 DocProcessingStatus 类用于文档处理状态管理
- Implement JsonDocStatusStorage for persistent status storage
- 实现 JsonDocStatusStorage 用于持久化状态存储
- Add document-level deduplication in batch processing
- 在批处理中添加文档级别的去重功能
- Add checkpoint support in ainsert method for resumable document processing
- 在 ainsert 方法中添加断点续传支持,实现可恢复的文档处理
- Add status query methods for monitoring processing progress
- 添加状态查询方法用于监控处理进度
- Update LightRAG initialization to support document status tracking
- 更新 LightRAG 初始化以支持文档状态跟踪
2024-12-28 00:11:25 +08:00
zrguo
457e683acd
Update lightrag.py
2024-12-26 22:14:04 +08:00
Alex Potapenko
6f71293c83
Add Gremlin graph storage
2024-12-19 17:47:42 +01:00
Weaxs
344d8f277b
support TiDBGraphStorage
2024-12-18 10:57:33 +08:00
GG
2d048b5eb0
fix(llm): hashing_kv初始化修复
...
-hybrid模式对hashing_kv的依赖不止global_config,干脆复用llm_response_cache的初始化结构
2024-12-17 16:44:42 +08:00
Alex Potapenko
7564841450
Add Apache AGE graph storage
2024-12-13 20:41:38 +01:00
Weaxs
288985eab4
pre-commit fix tidb
2024-12-12 10:22:31 +08:00
Weaxs
8ef5a6b8cd
support TiDB: add TiDBKVStorage, TiDBVectorDBStorage
2024-12-11 16:23:50 +08:00
zrguo
504a3c233b
Merge branch 'main' into pkaushal/vectordb-chroma
2024-12-11 14:21:36 +08:00
Pankaj Kaushal
ca788463cc
feat: Add ChromaDB integration for vector storage
...
- Implemented `ChromaVectorDBStorage` class in `lightrag/kg/chroma_impl.py` to support ChromaDB as a vector storage backend.
- Updated `lightrag.py` to include `ChromaVectorDBStorage` in the storage class mapping.
- Added a test script `test_chromadb.py` to demonstrate the usage of ChromaDB with LightRAG, including configuration for embedding functions and ChromaDB connection settings.
- fix lazy import function to support package context for dynamic class loading.
288d4b8355
2024-12-10 16:23:05 +01:00
david
288d4b8355
fix lazy import
2024-12-10 17:16:21 +08:00
zrguo
3e112c0d05
Merge pull request #432 from ChenZiHong-Gavin/main
...
fix(lightrag): use is_closed() instead of _closed
2024-12-09 18:08:43 +08:00
zrguo
4c89a1a620
Merge pull request #429 from davidleon/improvement/lazy_external_load
...
fix extra kwargs error: keyword_extraction.
2024-12-09 18:07:30 +08:00
chenzihong
9dd51f1f35
fix(lightrag): use is_closed() instead of _closed
2024-12-09 17:10:13 +08:00
david
9717ad87fc
fix extra kwargs error: keyword_extraction.
...
add lazy_external_load to reduce external lib deps whenever it's not necessary for user.
2024-12-09 15:35:35 +08:00
Magic_yuan
ccf44dc334
feat(cache): 增加 LLM 相似性检查功能并优化缓存机制
...
- 在 embedding 缓存配置中添加 use_llm_check 参数
- 实现 LLM 相似性检查逻辑,作为缓存命中的二次验证- 优化 naive 模式的缓存处理流程
- 调整缓存数据结构,移除不必要的 model 字段
2024-12-08 17:35:52 +08:00
magicyuan876
d48c6e4588
feat(lightrag): 添加 查询时使用embedding缓存功能
...
- 在 LightRAG 类中添加 embedding_cache_config配置项
- 实现基于 embedding 相似度的缓存查询和存储
- 添加量化和反量化函数,用于压缩 embedding 数据
- 新增示例演示 embedding 缓存的使用
2024-12-06 08:17:20 +08:00
partoneplay
d8ba7c57f3
Add MongoDB as KV storage
2024-12-05 13:57:43 +08:00
zrguo
6d274019dd
Merge pull request #393 from partoneplay/main
...
Add Milvus as vector storage
2024-12-05 12:05:30 +08:00
partoneplay
052322b213
Add Milvus as vector storage
2024-12-05 08:48:41 +08:00
LarFii
44d441a951
update insert custom kg
2024-12-04 19:44:04 +08:00
zrguo
6927b57520
Merge pull request #378 from doosenn/main
...
fix neo4jstorage bug
2024-12-04 11:11:19 +08:00
magicyuan876
607d4f9555
修改日志文件路径
...
- 因为LightRAG的几乎都是导入的utils中的全局logger对象,当多个rag实例的时候并无法完全把日志记录到对应的working_dir,并且应用中删除working_dir时会由于logger的句柄无法删除
- 此修改简化了日志文件的路径,不再依赖于 working_dir 属性,日志文件独立于working_dir
2024-12-04 08:44:13 +08:00
zuoluo
801619084f
fix neo4jstorage bug
2024-12-03 16:04:58 +08:00
Tasha Upchurch
eae310cd68
fix for #209
...
function was returning a closed event loop.
2024-11-29 13:27:08 -07:00
jin
9f3c0581ac
Merge branch 'HKUDS:main' into main
2024-11-27 15:16:28 +08:00
Larfii
cb492ccb04
Add custom KG insertion
2024-11-25 18:06:19 +08:00
Larfii
8562ecdebc
Add a progress bar
2024-11-25 15:04:38 +08:00
jin
1dbe803521
Merge branch 'main' of https://github.com/jin38324/LightRAG
2024-11-25 13:32:33 +08:00
jin
89c2de54a2
Optimization logic
2024-11-25 13:29:55 +08:00
LarFii
ce7f524174
Update
2024-11-19 16:52:26 +08:00
Richard
6bdf693b85
fix neo4j bug
2024-11-15 13:11:43 +08:00
Rick Battle
d4a27c901e
Only update storage if there was something to insert
...
Before, the `finally` block would always call `_insert_done()`, which writes out the `vdb_*` and `kv_store_*` files ... even if there was nothing to insert (because all docs had already been inserted). This was causing the speed of skippable inserts to become very slow as the graph grew.
2024-11-12 09:30:21 -07:00
jin
41599897fb
fix pre commit
2024-11-12 13:32:40 +08:00