4593 Commits

Author SHA1 Message Date
yangdx
1ed77a2e53 Remove openai-ollama binding from LightRAG level args 2025-08-17 02:13:50 +08:00
SJ
f7ca9ae16a Ruff formatted 2025-08-15 22:21:34 +00:00
SJ
3aa3332505
Merge pull request #1 from HKUDS/main
merge
2025-08-15 17:09:03 -05:00
Daniel.y
bdd1169cfb
Merge pull request #1959 from danielaskdd/pick-trunk-by-vector
Feat: add KG related chunks selection by vector similarity
2025-08-15 19:33:51 +08:00
yangdx
2a781dfb91 Update Neo4j database naming in env.example 2025-08-15 19:14:38 +08:00
yangdx
3a227e37b8 Add get_vectors_by_ids method to MongoVectorDBStorage 2025-08-15 16:53:14 +08:00
yangdx
7a7385a200 Add efficient vector retrieval by IDs to PGVectorStorage 2025-08-15 16:51:41 +08:00
yangdx
8f7031b882 Add get_vectors_by_ids method to QdrantVectorDBStorage 2025-08-15 16:46:52 +08:00
yangdx
a71499a180 Add get_vectors_by_ids method to MilvusVectorDBStorage 2025-08-15 16:36:50 +08:00
yangdx
1e2d5252d7 Add get_vectors_by_ids method and filter out vector data from query results 2025-08-15 16:32:26 +08:00
yangdx
6cab68bb47 Improve KG chunk selection documentation and configuration clarity 2025-08-15 10:09:44 +08:00
yangdx
3acb32f547 Add comments explaining chunk deduplication behavior in query context 2025-08-15 02:19:01 +08:00
yangdx
0b45d463df Add .clinerules to .gitignore 2025-08-15 00:43:45 +08:00
yangdx
f733ac829c Remove debug logging statements from query context building 2025-08-14 23:44:34 +08:00
yangdx
4a19d0de25 Add chunk tracking system to monitor chunk sources and frequencies
• Track chunk sources (E/R/C types)
• Log frequency and order metadata
• Preserve chunk_id through processing
• Add debug logging for chunk tracking
• Handle rerank and truncation operations
2025-08-14 22:58:26 +08:00
yangdx
a8b7890470 Rename chunk selection functions for better clarity 2025-08-14 16:01:13 +08:00
yangdx
a11e8d77eb Improve missing-vector warning logic in vector similarity
- Check for any missing vectors
- Separate no-vector vs partial-vector warnings
- Ensure early return on empty vectors
2025-08-14 14:24:15 +08:00
yangdx
5c7ae8721b Merge branch 'main' into pick-trunk-by-vector 2025-08-14 13:11:14 +08:00
Daniel.y
79d5210988
Merge pull request #1954 from danielaskdd/pipeline-refactor
Feat: Reprocessing of failed documents without the original file being present
2025-08-14 13:09:23 +08:00
yangdx
3bba5fc506 Fix linting 2025-08-14 13:03:23 +08:00
yangdx
772f981e7e fix: check and process queued docs even when upload directory is empty 2025-08-14 12:35:39 +08:00
yangdx
65a4437f78 Fix: Persist document data immediately after index update 2025-08-14 12:33:36 +08:00
yangdx
28fc075c59 Simplify inconsistency logging and cleanup messages 2025-08-14 11:49:58 +08:00
yangdx
17faeb2fb8 refactor: integrate document consistency validation into pipeline processing
This ensures data consistency validation is part of the main processing pipeline and provides better monitoring of inconsistent document cleanup operations.
2025-08-14 11:38:36 +08:00
yangdx
a3f7bc5b7e Merge branch 'main' into pick-trunk-by-vector 2025-08-14 06:19:57 +08:00
yangdx
b5ae84fac6 fix: Add data consistency validation to document processing pipeline
- Add _validate_and_fix_document_consistency() method to detect and fix documents with missing content in full_docs storage
- Integrate consistency check into apipeline_process_enqueue_documents() to automatically mark inconsistent documents as FAILED before processing
- Prevent processing errors caused by documents having status records but missing actual content data
2025-08-14 06:18:34 +08:00
yangdx
cb122c63e4 Merge branch 'main' into pick-trunk-by-vector 2025-08-14 05:34:15 +08:00
Daniel.y
dc76ae02d6
Merge pull request #1952 from danielaskdd/fix-pipeline
Fixes crash when processing files with UTF-8 encoding error
2025-08-14 05:33:08 +08:00
yangdx
fd0ae4646f Fixes crash when processing files with UTF-8 encoding error
- Fix TypeError "cannot unpack non-iterable bool object" in document processing
- Change all error returns from `False` to `(False, "")` for consistency
- Ensure pipeline_enqueue_file always returns tuple (bool, str)
- Add missing return statement for no-content-extracted case
- Improve error handling for UTF-8 encoding issues and unsupported file types
2025-08-14 05:31:38 +08:00
yangdx
042637d6a3 Merge branch 'main' into pick-trunk-by-vector 2025-08-14 05:09:14 +08:00
yangdx
3ccd10f1e4 Update webui assets 2025-08-14 05:03:43 +08:00
yangdx
6969038fd5 Update mermaid version to 11.9.0 2025-08-14 05:02:53 +08:00
yangdx
160a40dc04 Bump api version to 0201 2025-08-14 05:02:20 +08:00
yangdx
ae517181ad Bump api version to 0200 2025-08-14 05:01:13 +08:00
yangdx
f85e2aa4bf Merge branch 'main' into pick-trunk-by-vector 2025-08-14 03:54:26 +08:00
Daniel.y
2bbb19143a
Merge pull request #1951 from danielaskdd/main
Refac: uniformly protected with the get_data_init_lock for all storage initializations
2025-08-14 03:52:37 +08:00
yangdx
0b22ffb252 Refac: uniformly protected with the get_data_init_lock for all storage initializations 2025-08-14 03:46:19 +08:00
yangdx
2e5487305e Merge branch 'main' into pick-trunk-by-vector 2025-08-14 03:12:38 +08:00
Daniel.y
1be1649f75
Merge pull request #1949 from danielaskdd/main
Fix: remove query params from cache key generation for keyword extraction
2025-08-14 03:09:09 +08:00
yangdx
7fb11193b0 Fix linting 2025-08-14 03:07:29 +08:00
yangdx
331dcf0509 Remove query params from cache key generation for keyword extration 2025-08-14 02:57:39 +08:00
yangdx
9a62101e9d Add OpenAI frequency penalty sample env params 2025-08-14 02:57:23 +08:00
yangdx
3343833571 Remove query params from cache key generation for keyword extration 2025-08-14 02:36:01 +08:00
yangdx
2a46667ac9 Add OpenAI frequency penalty sample env params 2025-08-14 01:50:27 +08:00
yangdx
bac09118d5 Simplify embedding func extraction 2025-08-14 01:09:18 +08:00
yangdx
ac3b5605a1 Refactor logging for relation chunk discovery with dedup info 2025-08-14 00:41:58 +08:00
yangdx
edac10906c fix: Add total_relation_chunks statistics and improve logging in _find_related_text_unit_from_relations 2025-08-13 23:45:31 +08:00
yangdx
5a40ff654e Change KG chunk selection default to VECTOR
- Set KG_CHUNK_PICK_METHOD default to VECTOR
- Update env.example with new config option
2025-08-13 23:10:42 +08:00
yangdx
947e826e61 Bump api version to 0200 2025-08-13 18:29:07 +08:00
yangdx
f1dafa0d01 feat: KG related chunks selection by vector similarity
- Add env switch to toggle weighted polling vs vector-similarity strategy
- Implement similarity-based sorting with fallback to weighted
- Introduce batch vector read API for vector storage
- Implement vector store and retrive funtion for Nanovector DB
- Preserve default behavior (weighted polling selection method)
2025-08-13 18:16:42 +08:00