Matt23-star
6a7e3092ea
feat: optimize node and edge queries in PostgreSQL. query tables Directly
2025-08-16 22:37:48 +08:00
Matt23-star
a7da48e05c
feat: add batch size parameter to node and edge retrieval methods
2025-08-16 22:35:22 +08:00
yangdx
dc7a6e1c5b
Update README
2025-08-16 06:15:27 +08:00
yangdx
2a781dfb91
Update Neo4j database naming in env.example
2025-08-15 19:14:38 +08:00
yangdx
3a227e37b8
Add get_vectors_by_ids method to MongoVectorDBStorage
2025-08-15 16:53:14 +08:00
yangdx
7a7385a200
Add efficient vector retrieval by IDs to PGVectorStorage
2025-08-15 16:51:41 +08:00
yangdx
8f7031b882
Add get_vectors_by_ids method to QdrantVectorDBStorage
2025-08-15 16:46:52 +08:00
yangdx
a71499a180
Add get_vectors_by_ids method to MilvusVectorDBStorage
2025-08-15 16:36:50 +08:00
yangdx
1e2d5252d7
Add get_vectors_by_ids method and filter out vector data from query results
2025-08-15 16:32:26 +08:00
yangdx
6cab68bb47
Improve KG chunk selection documentation and configuration clarity
2025-08-15 10:09:44 +08:00
yangdx
3acb32f547
Add comments explaining chunk deduplication behavior in query context
2025-08-15 02:19:01 +08:00
yangdx
f733ac829c
Remove debug logging statements from query context building
2025-08-14 23:44:34 +08:00
yangdx
4a19d0de25
Add chunk tracking system to monitor chunk sources and frequencies
...
• Track chunk sources (E/R/C types)
• Log frequency and order metadata
• Preserve chunk_id through processing
• Add debug logging for chunk tracking
• Handle rerank and truncation operations
2025-08-14 22:58:26 +08:00
yangdx
a8b7890470
Rename chunk selection functions for better clarity
2025-08-14 16:01:13 +08:00
yangdx
a11e8d77eb
Improve missing-vector warning logic in vector similarity
...
- Check for any missing vectors
- Separate no-vector vs partial-vector warnings
- Ensure early return on empty vectors
2025-08-14 14:24:15 +08:00
yangdx
5c7ae8721b
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 13:11:14 +08:00
yangdx
3bba5fc506
Fix linting
2025-08-14 13:03:23 +08:00
yangdx
772f981e7e
fix: check and process queued docs even when upload directory is empty
2025-08-14 12:35:39 +08:00
yangdx
65a4437f78
Fix: Persist document data immediately after index update
2025-08-14 12:33:36 +08:00
yangdx
28fc075c59
Simplify inconsistency logging and cleanup messages
2025-08-14 11:49:58 +08:00
yangdx
17faeb2fb8
refactor: integrate document consistency validation into pipeline processing
...
This ensures data consistency validation is part of the main processing pipeline and provides better monitoring of inconsistent document cleanup operations.
2025-08-14 11:38:36 +08:00
yangdx
a3f7bc5b7e
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 06:19:57 +08:00
yangdx
b5ae84fac6
fix: Add data consistency validation to document processing pipeline
...
- Add _validate_and_fix_document_consistency() method to detect and fix documents with missing content in full_docs storage
- Integrate consistency check into apipeline_process_enqueue_documents() to automatically mark inconsistent documents as FAILED before processing
- Prevent processing errors caused by documents having status records but missing actual content data
2025-08-14 06:18:34 +08:00
yangdx
cb122c63e4
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 05:34:15 +08:00
yangdx
fd0ae4646f
Fixes crash when processing files with UTF-8 encoding error
...
- Fix TypeError "cannot unpack non-iterable bool object" in document processing
- Change all error returns from `False` to `(False, "")` for consistency
- Ensure pipeline_enqueue_file always returns tuple (bool, str)
- Add missing return statement for no-content-extracted case
- Improve error handling for UTF-8 encoding issues and unsupported file types
2025-08-14 05:31:38 +08:00
yangdx
042637d6a3
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 05:09:14 +08:00
yangdx
3ccd10f1e4
Update webui assets
2025-08-14 05:03:43 +08:00
yangdx
160a40dc04
Bump api version to 0201
2025-08-14 05:02:20 +08:00
yangdx
ae517181ad
Bump api version to 0200
2025-08-14 05:01:13 +08:00
yangdx
f85e2aa4bf
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 03:54:26 +08:00
yangdx
0b22ffb252
Refac: uniformly protected with the get_data_init_lock for all storage initializations
2025-08-14 03:46:19 +08:00
yangdx
2e5487305e
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 03:12:38 +08:00
yangdx
7fb11193b0
Fix linting
2025-08-14 03:07:29 +08:00
yangdx
331dcf0509
Remove query params from cache key generation for keyword extration
2025-08-14 02:57:39 +08:00
yangdx
3343833571
Remove query params from cache key generation for keyword extration
2025-08-14 02:36:01 +08:00
yangdx
bac09118d5
Simplify embedding func extraction
2025-08-14 01:09:18 +08:00
yangdx
ac3b5605a1
Refactor logging for relation chunk discovery with dedup info
2025-08-14 00:41:58 +08:00
yangdx
edac10906c
fix: Add total_relation_chunks statistics and improve logging in _find_related_text_unit_from_relations
2025-08-13 23:45:31 +08:00
yangdx
5a40ff654e
Change KG chunk selection default to VECTOR
...
- Set KG_CHUNK_PICK_METHOD default to VECTOR
- Update env.example with new config option
2025-08-13 23:10:42 +08:00
yangdx
947e826e61
Bump api version to 0200
2025-08-13 18:29:07 +08:00
yangdx
f1dafa0d01
feat: KG related chunks selection by vector similarity
...
- Add env switch to toggle weighted polling vs vector-similarity strategy
- Implement similarity-based sorting with fallback to weighted
- Introduce batch vector read API for vector storage
- Implement vector store and retrive funtion for Nanovector DB
- Preserve default behavior (weighted polling selection method)
2025-08-13 18:16:42 +08:00
Daniel.y
5b0e26d9da
Merge pull request #1941 from HKUDS/add-final-namespace
...
Fix: Resolve workspace isolation issues across multiple storage implementations
2025-08-12 20:17:53 +08:00
Daniel.y
203e420b51
Merge pull request #1931 from danielaskdd/fix-first-stage-tasks-missing
...
Fix: Initialize first_stage_tasks and entity_relation_task to prevent empty-task cancel errors
2025-08-12 19:19:04 +08:00
yangdx
578bdaa410
Pin pymilvus version to 2.5.2 to avoid Protobuf version warning
2025-08-12 18:22:00 +08:00
yangdx
5d1bc8b49d
Relocate client creation to the initialize method to prevent race conditions in multi-process mode.
2025-08-12 18:20:56 +08:00
yangdx
74783d7781
Remove redundant debug logging for Qdrant operations
2025-08-12 17:29:05 +08:00
zrguo
f1c7233763
Avoid UTF-8 BOM
2025-08-12 17:06:54 +08:00
yangdx
41f8ef05b9
Restore thread safety to MongoDB client manager
...
- Protected client creation with lock
- Protected client release with lock
2025-08-12 16:42:53 +08:00
yangdx
0b2c3d06c7
- Remove redundant collection listing check
2025-08-12 15:24:06 +08:00
yangdx
fc8ca1a706
Fix: add muti-process lock for initialize and drop method for all storage
2025-08-12 04:25:09 +08:00