470 Commits

Author SHA1 Message Date
zrguo
265fc2c87f fix linting 2025-03-02 14:24:54 +08:00
zrguo
5198240cf9 remove " and upper() 2025-03-02 14:23:06 +08:00
yangdx
7124845e55 Optimize document processing pipeline with better status tracking & batch handling
• Add upfront doc processing check
• Optimize pipeline status updates
2025-03-02 11:09:32 +08:00
yangdx
1a5eb20003 Fix history_messages clearing in LightRAG pipeline status initialization 2025-03-02 04:43:41 +08:00
yangdx
c0b22a8ae2 Merge branch 'main' into add-multi-worker-support 2025-03-02 02:54:57 +08:00
Abyl Ikhsanov
bf0ddc6450 making optional for ainsert 2025-03-01 13:34:49 +01:00
Abyl Ikhsanov
baa505eeb0 adding full_doc_id to insert 2025-03-01 13:26:02 +01:00
zrguo
9aa438cf79 fix linting 2025-03-01 18:35:12 +08:00
zrguo
a8f4385c05 Add clear_cache 2025-03-01 18:30:58 +08:00
yangdx
e3a40c2fdb Fix linting 2025-03-01 16:23:34 +08:00
yangdx
3507e894d9 Merge branch 'main' into add-multi-worker-support 2025-03-01 15:55:37 +08:00
yangdx
c07a5039b7 Refactor shared storage locks to separate pipeline, storage and internal locks for deadlock preventing 2025-03-01 10:48:55 +08:00
yangdx
b3328542c7 refactor: migrate synchronous locks to async locks for improved concurrency
• Add UnifiedLock wrapper class
• Convert with blocks to async with
2025-03-01 02:22:35 +08:00
yangdx
731d820bcc Remove redundancy set_logger function and related calls 2025-02-28 21:46:45 +08:00
yangdx
c973498c34 Fix linting 2025-02-28 21:35:04 +08:00
yangdx
8cd45161f2 feat: add history_messages to track pipeline processing progress
• Add shared history_messages list
• Track pipeline progress with messages
2025-02-28 13:53:40 +08:00
yangdx
b2da69b7f1 Add pipeline status control for concurrent document indexing processes
• Add shared pipeline status namespace
• Implement concurrent process control
• Add request queuing for pending jobs
2025-02-28 11:52:42 +08:00
yangdx
b4bcd76599 Remove useless scan progress tracking functionality and related code 2025-02-28 10:53:36 +08:00
Huỳnh Triệu Vĩ
2f7fe5e4b6 feat: fix delete by document id 2025-02-27 23:34:57 +07:00
yangdx
27500191b4 Standarize scan progress namespace initialization 2025-02-27 19:08:36 +08:00
yangdx
64f22966a3 Fix linting 2025-02-27 19:05:51 +08:00
yangdx
946095ef80 Fix multiprocess dict creation logic, add process safety locks for namespace creation. 2025-02-27 19:03:53 +08:00
yangdx
92ecb0da97 Refactor document scanning progress share variable initialization 2025-02-27 16:07:00 +08:00
yangdx
7c237920b1 Refactor shared storage to support both single and multi-process modes
• Initialize storage based on worker count
• Remove redundant global variable checks
• Add explicit mutex initialization
• Centralize shared storage initialization
• Fix process/thread lock selection logic
2025-02-27 08:48:33 +08:00
Yannick Stephan
3c9908b94a fixed lint 2025-02-26 12:11:28 +01:00
Yannick Stephan
4963305dc3
Merge pull request #950 from cnjack/feat/custom_doc_ids
add support for the single document and custom chunks method
2025-02-26 12:08:02 +01:00
yangdx
7436c06f6c Fix linting 2025-02-26 18:11:16 +08:00
jack
fee90ddd9d add support for the single document and custom chunks method 2025-02-26 14:41:10 +08:00
yangdx
2752a764ae Refactor storage implementations to support both single and multi-process modes
• Add shared storage management module
• Support process/thread lock based on mode
2025-02-26 05:38:38 +08:00
yangdx
8050b0f91b feat: automatically initialize API manager in single process mode
- Add manager init check in __post_init__
- Call initialize_manager if needed
- Add info log message for init
- Ensure API manager is ready for use
2025-02-25 12:09:30 +08:00
Huỳnh Triệu Vĩ
4a3c6de4ba remove character ticks 2025-02-25 04:18:52 +07:00
Huỳnh Triệu Vĩ
a4d88b8cd4 fix this event loop is already running 2025-02-25 04:16:22 +07:00
yangdx
f29628125b Fix typo in parameter name from 'nodel_label' to 'node_label' 2025-02-24 02:36:36 +08:00
yangdx
f5efe5977b Merge branch 'clear-text-before-insert' into simplify-cli-arguments 2025-02-23 17:06:39 +08:00
yangdx
845e914f1b fix: make ids parameter optional and optimize input text cleaning
- Add default None value for ids parameter
- Move text cleaning into else branch
- Only clean text when auto-generating ids
- Preserve original text with custom ids
- Improve code readability
2025-02-23 15:46:47 +08:00
yangdx
e935fed50e Add automatic comment handling in .env files 2025-02-22 13:25:12 +08:00
yangdx
351c8db849 Fix linting 2025-02-22 10:27:20 +08:00
yangdx
411782797b Fix linting 2025-02-22 10:18:39 +08:00
yangdx
3c866eec16 Merge branch 'refactor-api-server' into clear-text-before-insert 2025-02-22 10:04:56 +08:00
yangdx
dff07e50a4 Merge branch 'main' into refactor-api-server 2025-02-21 21:12:02 +08:00
zrguo
6ed81ed1c6
Merge pull request #906 from konrad-woj/fix-insert-custom-chunks
fix insert_custom_chunks skipping every new doc
2025-02-21 18:45:40 +08:00
zrguo
84f975f63f
Merge pull request #892 from PiochU19/main
add support of providing ids for documents insert
2025-02-21 18:42:52 +08:00
yangdx
5fa6982d36 Merge branch 'refactor-api-server' into clear-text-before-insert 2025-02-21 14:57:11 +08:00
yangdx
cff229a806 fix: respect user-specified log level in set_logger
Previously, the set_logger function would always set the log level to DEBUG, overriding any user-specified log level.
2025-02-21 14:46:27 +08:00
yangdx
f5bd3f2b16 Fix linting 2025-02-21 13:23:55 +08:00
yangdx
bee4622052 fix: handle null bytes (0x00) in text processing
- Fix PostgreSQL encoding error by properly handling null bytes (0x00) in text processing.
- The clean_text function now removes null bytes from all input text during the indexing phase.
2025-02-21 13:18:26 +08:00
Konrad Wojciechowski
50eb97762a fix insert_custom_chunks skipping every new doc with "This document is already in the storage." 2025-02-20 23:08:36 +01:00
Yannick Stephan
678e0f9aea
Revert "Cleanup of code" 2025-02-20 15:09:43 +01:00
Yannick Stephan
439685e69c
Revert "removed get_knowledge_graph" 2025-02-20 14:29:36 +01:00
Yannick Stephan
c4562f71b9 cleanup extraction 2025-02-20 14:17:26 +01:00