yangdx
1ed77a2e53
Remove openai-ollama binding from LightRAG level args
2025-08-17 02:13:50 +08:00
SJ
f7ca9ae16a
Ruff formatted
2025-08-15 22:21:34 +00:00
SJ
3aa3332505
Merge pull request #1 from HKUDS/main
...
merge
2025-08-15 17:09:03 -05:00
Daniel.y
bdd1169cfb
Merge pull request #1959 from danielaskdd/pick-trunk-by-vector
...
Feat: add KG related chunks selection by vector similarity
2025-08-15 19:33:51 +08:00
yangdx
2a781dfb91
Update Neo4j database naming in env.example
2025-08-15 19:14:38 +08:00
yangdx
3a227e37b8
Add get_vectors_by_ids method to MongoVectorDBStorage
2025-08-15 16:53:14 +08:00
yangdx
7a7385a200
Add efficient vector retrieval by IDs to PGVectorStorage
2025-08-15 16:51:41 +08:00
yangdx
8f7031b882
Add get_vectors_by_ids method to QdrantVectorDBStorage
2025-08-15 16:46:52 +08:00
yangdx
a71499a180
Add get_vectors_by_ids method to MilvusVectorDBStorage
2025-08-15 16:36:50 +08:00
yangdx
1e2d5252d7
Add get_vectors_by_ids method and filter out vector data from query results
2025-08-15 16:32:26 +08:00
yangdx
6cab68bb47
Improve KG chunk selection documentation and configuration clarity
2025-08-15 10:09:44 +08:00
yangdx
3acb32f547
Add comments explaining chunk deduplication behavior in query context
2025-08-15 02:19:01 +08:00
yangdx
0b45d463df
Add .clinerules to .gitignore
2025-08-15 00:43:45 +08:00
yangdx
f733ac829c
Remove debug logging statements from query context building
2025-08-14 23:44:34 +08:00
yangdx
4a19d0de25
Add chunk tracking system to monitor chunk sources and frequencies
...
• Track chunk sources (E/R/C types)
• Log frequency and order metadata
• Preserve chunk_id through processing
• Add debug logging for chunk tracking
• Handle rerank and truncation operations
2025-08-14 22:58:26 +08:00
yangdx
a8b7890470
Rename chunk selection functions for better clarity
2025-08-14 16:01:13 +08:00
yangdx
a11e8d77eb
Improve missing-vector warning logic in vector similarity
...
- Check for any missing vectors
- Separate no-vector vs partial-vector warnings
- Ensure early return on empty vectors
2025-08-14 14:24:15 +08:00
yangdx
5c7ae8721b
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 13:11:14 +08:00
Daniel.y
79d5210988
Merge pull request #1954 from danielaskdd/pipeline-refactor
...
Feat: Reprocessing of failed documents without the original file being present
2025-08-14 13:09:23 +08:00
yangdx
3bba5fc506
Fix linting
2025-08-14 13:03:23 +08:00
yangdx
772f981e7e
fix: check and process queued docs even when upload directory is empty
2025-08-14 12:35:39 +08:00
yangdx
65a4437f78
Fix: Persist document data immediately after index update
2025-08-14 12:33:36 +08:00
yangdx
28fc075c59
Simplify inconsistency logging and cleanup messages
2025-08-14 11:49:58 +08:00
yangdx
17faeb2fb8
refactor: integrate document consistency validation into pipeline processing
...
This ensures data consistency validation is part of the main processing pipeline and provides better monitoring of inconsistent document cleanup operations.
2025-08-14 11:38:36 +08:00
yangdx
a3f7bc5b7e
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 06:19:57 +08:00
yangdx
b5ae84fac6
fix: Add data consistency validation to document processing pipeline
...
- Add _validate_and_fix_document_consistency() method to detect and fix documents with missing content in full_docs storage
- Integrate consistency check into apipeline_process_enqueue_documents() to automatically mark inconsistent documents as FAILED before processing
- Prevent processing errors caused by documents having status records but missing actual content data
2025-08-14 06:18:34 +08:00
yangdx
cb122c63e4
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 05:34:15 +08:00
Daniel.y
dc76ae02d6
Merge pull request #1952 from danielaskdd/fix-pipeline
...
Fixes crash when processing files with UTF-8 encoding error
2025-08-14 05:33:08 +08:00
yangdx
fd0ae4646f
Fixes crash when processing files with UTF-8 encoding error
...
- Fix TypeError "cannot unpack non-iterable bool object" in document processing
- Change all error returns from `False` to `(False, "")` for consistency
- Ensure pipeline_enqueue_file always returns tuple (bool, str)
- Add missing return statement for no-content-extracted case
- Improve error handling for UTF-8 encoding issues and unsupported file types
2025-08-14 05:31:38 +08:00
yangdx
042637d6a3
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 05:09:14 +08:00
yangdx
3ccd10f1e4
Update webui assets
2025-08-14 05:03:43 +08:00
yangdx
6969038fd5
Update mermaid version to 11.9.0
2025-08-14 05:02:53 +08:00
yangdx
160a40dc04
Bump api version to 0201
2025-08-14 05:02:20 +08:00
yangdx
ae517181ad
Bump api version to 0200
2025-08-14 05:01:13 +08:00
yangdx
f85e2aa4bf
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 03:54:26 +08:00
Daniel.y
2bbb19143a
Merge pull request #1951 from danielaskdd/main
...
Refac: uniformly protected with the get_data_init_lock for all storage initializations
2025-08-14 03:52:37 +08:00
yangdx
0b22ffb252
Refac: uniformly protected with the get_data_init_lock for all storage initializations
2025-08-14 03:46:19 +08:00
yangdx
2e5487305e
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 03:12:38 +08:00
Daniel.y
1be1649f75
Merge pull request #1949 from danielaskdd/main
...
Fix: remove query params from cache key generation for keyword extraction
2025-08-14 03:09:09 +08:00
yangdx
7fb11193b0
Fix linting
2025-08-14 03:07:29 +08:00
yangdx
331dcf0509
Remove query params from cache key generation for keyword extration
2025-08-14 02:57:39 +08:00
yangdx
9a62101e9d
Add OpenAI frequency penalty sample env params
2025-08-14 02:57:23 +08:00
yangdx
3343833571
Remove query params from cache key generation for keyword extration
2025-08-14 02:36:01 +08:00
yangdx
2a46667ac9
Add OpenAI frequency penalty sample env params
2025-08-14 01:50:27 +08:00
yangdx
bac09118d5
Simplify embedding func extraction
2025-08-14 01:09:18 +08:00
yangdx
ac3b5605a1
Refactor logging for relation chunk discovery with dedup info
2025-08-14 00:41:58 +08:00
yangdx
edac10906c
fix: Add total_relation_chunks statistics and improve logging in _find_related_text_unit_from_relations
2025-08-13 23:45:31 +08:00
yangdx
5a40ff654e
Change KG chunk selection default to VECTOR
...
- Set KG_CHUNK_PICK_METHOD default to VECTOR
- Update env.example with new config option
2025-08-13 23:10:42 +08:00
yangdx
947e826e61
Bump api version to 0200
2025-08-13 18:29:07 +08:00
yangdx
f1dafa0d01
feat: KG related chunks selection by vector similarity
...
- Add env switch to toggle weighted polling vs vector-similarity strategy
- Implement similarity-based sorting with fallback to weighted
- Introduce batch vector read API for vector storage
- Implement vector store and retrive funtion for Nanovector DB
- Preserve default behavior (weighted polling selection method)
2025-08-13 18:16:42 +08:00