3070 Commits

Author SHA1 Message Date
yangdx
5a40ff654e Change KG chunk selection default to VECTOR
- Set KG_CHUNK_PICK_METHOD default to VECTOR
- Update env.example with new config option
2025-08-13 23:10:42 +08:00
yangdx
947e826e61 Bump api version to 0200 2025-08-13 18:29:07 +08:00
yangdx
f1dafa0d01 feat: KG related chunks selection by vector similarity
- Add env switch to toggle weighted polling vs vector-similarity strategy
- Implement similarity-based sorting with fallback to weighted
- Introduce batch vector read API for vector storage
- Implement vector store and retrive funtion for Nanovector DB
- Preserve default behavior (weighted polling selection method)
2025-08-13 18:16:42 +08:00
SJ
99643f01de
Enhancement: support aws bedrock as an LLm binding #1733 2025-08-13 02:08:13 -05:00
Daniel.y
5b0e26d9da
Merge pull request #1941 from HKUDS/add-final-namespace
Fix: Resolve workspace isolation issues across multiple storage implementations
2025-08-12 20:17:53 +08:00
Daniel.y
203e420b51
Merge pull request #1931 from danielaskdd/fix-first-stage-tasks-missing
Fix: Initialize first_stage_tasks and entity_relation_task to prevent empty-task cancel errors
2025-08-12 19:19:04 +08:00
yangdx
578bdaa410 Pin pymilvus version to 2.5.2 to avoid Protobuf version warning 2025-08-12 18:22:00 +08:00
yangdx
5d1bc8b49d Relocate client creation to the initialize method to prevent race conditions in multi-process mode. 2025-08-12 18:20:56 +08:00
yangdx
74783d7781 Remove redundant debug logging for Qdrant operations 2025-08-12 17:29:05 +08:00
zrguo
f1c7233763 Avoid UTF-8 BOM 2025-08-12 17:06:54 +08:00
yangdx
41f8ef05b9 Restore thread safety to MongoDB client manager
- Protected client creation with lock
- Protected client release with lock
2025-08-12 16:42:53 +08:00
yangdx
0b2c3d06c7 - Remove redundant collection listing check 2025-08-12 15:24:06 +08:00
yangdx
fc8ca1a706 Fix: add muti-process lock for initialize and drop method for all storage 2025-08-12 04:25:09 +08:00
yangdx
ca00b9c8ee Fix: Resolve workspace isolation problem for PostgreSQL with multiple LightRAG instances 2025-08-12 01:27:05 +08:00
yangdx
d9c1f935f5 Fix: Resolve workspace isolation issues in in-memory database with multiple LightRAG instances 2025-08-12 01:26:09 +08:00
yangdx
095e0cbfa2 Refac: Add workspace infomation to all logger output for all storage type 2025-08-12 01:19:09 +08:00
yangdx
44204abef7 Fix linting 2025-08-10 10:59:32 +08:00
yangdx
eb2320e556 Fix: Initialize first_stage_tasks and entity_relation_task to prevent empty-task cancel errors
- Initialize first_stage_tasks = [] and entity_relation_task = None at coroutine start
- Ensure cancel block safely handles no-op when tasks lists are empty
2025-08-10 10:45:41 +08:00
yangdx
ffb642a5ce Fix linting 2025-08-09 08:41:41 +08:00
yangdx
ecd7777e61 Update OpenAI embedding handling for both list and base64 embeddings
- Fix OpenAI embedding array parsing
- Improve embedding data type safety
2025-08-09 08:40:33 +08:00
yangdx
cf064579ce Remove deprecated keyword extraction query methods
- Delete query_with_keywords function
- Remove kg_query_with_keywords helper
- Drop separate keyword extraction methods
2025-08-08 14:59:39 +08:00
yangdx
16c9a81f4c feat: support config.ini for PostgreSQL vector index settings
- Add support for reading vector_index_type, hnsw_m, hnsw_ef, and ivfflat_lists from config.ini
- Maintain backward compatibility with environment variables
- Update config.ini.example with new PostgreSQL vector index options
- Follow existing configuration priority: env vars > config.ini > defaults
2025-08-08 02:55:49 +08:00
yangdx
dec4148075 Merge branch 'main' into Matt23-star/main 2025-08-08 02:24:34 +08:00
yangdx
f38e10559e Update PostgreSQL vector index configuration
- Remove FLAT index support
- Standardize on HNSW as default
- Add dimension validation
- Improve error logging
- Clean up index creation code
2025-08-08 02:21:06 +08:00
yangdx
f4ef254de2 fix(neo4j): enhance connection lifecycle management to prevent timeout errors
- Add max_connection_lifetime, liveness_check_timeout, keep_alive parameters
- Extend retry mechanisms for connection reset scenarios
- Update config examples with new Neo4j connection options
- Resolves ClientTimeoutException during data insertion operations
2025-08-08 01:07:45 +08:00
yangdx
eded6d1187 Unify document chunks context format in only_need_context query
- Update Document Chunks label to include (DC) abbreviation
2025-08-08 00:02:53 +08:00
Matt23-star
727ca43d3c feat: add vector index creation functionality for PostgreSQL 2025-08-07 23:07:18 +08:00
yangdx
2dab4e321d Bump api version to 0199 2025-08-06 01:03:35 +08:00
yangdx
a04c11a598 Remove deprecated storage 2025-08-06 00:02:50 +08:00
yangdx
c22315ea6d refactor: remove selective LLM cache clearing functionality
- Remove optional 'modes' parameter from aclear_cache() and clear_cache() methods
- Replace deprecated drop_cache_by_modes() with drop() method for complete cache clearing
- Update API endpoint to ignore mode-specific parameters and clear all cache
- Simplify frontend clearCache() function to send empty request body

This change ensures all LLM cache is cleared together.
2025-08-05 23:51:51 +08:00
yangdx
cc1f7118e7 Remove deprecated cache_by_modes functionality from all storage 2025-08-05 23:20:26 +08:00
yangdx
8294d6d1b7 Remove deprecated mode field from LLM cache schema
- Drop mode column from LLM cache table
- Update primary key to exclude mode
- Remove mode from all SQL queries
- Deprecate mode-related methods
- Update schema migration logic
2025-08-05 23:18:54 +08:00
yangdx
0b5c708660 Update storage implementation documentation
- Add detailed storage type descriptions
- Remove Chroma from vector storage options
- Include recommended PostgreSQL version
- Add Memgraph to graph storage options
- Update performance comparison notes
2025-08-05 18:03:51 +08:00
yangdx
0463963520 fix: include all query parameters in LLM cache hash key generation
- Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions
- Add queryparam field to CacheData structure and PostgreSQL storage for debugging
- Update PostgreSQL schema with automatic migration for queryparam JSONB column
- Prevent incorrect cache hits between queries with different parameters

Fixes issue where different query parameters incorrectly shared the same cached results.
2025-08-05 18:03:10 +08:00
yangdx
cb75e6631e Remove quantized embedding info from LLM cache
- Delete quantize_embedding function
- Delete dequantize_embedding function
- Remove embedding fields from CacheData
- Update save_to_cache to exclude embedding data
- Clean up unused quantization-related code
2025-08-05 17:58:34 +08:00
yangdx
01bce8c26e feat: add warning logs for deleting non-completed documents 2025-08-05 12:21:08 +08:00
yangdx
6ff25210ea feat: improve Jina API error handling to show clean messages instead of HTML 2025-08-05 11:46:02 +08:00
yangdx
c5babf61d7 Feat: Change embedding formats from float to base64 for efficiency
- Add base64 support for Jina embeddings
- Add base64 support for OpenAI embeddings
- Update env.example with new embedding options
2025-08-05 11:38:40 +08:00
yangdx
4d492abf41 feat: implement temperature priority cascade for LLM bindings
- Add global --temperature command line argument with env fallback
- Implement temperature priority for Ollama LLM binding:
  1. --ollama-llm-temperature (highest)
  2. OLLAMA_LLM_TEMPERATURE env var
  3. --temperature command arg
  4. TEMPERATURE env var (lowest)
- Implement same priority logic for OpenAI/Azure OpenAI LLM binding
- Ensure command line args always override environment variables
- Maintain backward compatibility with existing configurations
2025-08-05 04:53:55 +08:00
yangdx
adf7ec8e35 feat: Add OpenAI LLM Options support with BindingOptions framework
- Add OpenAILLMOptions dataclass with full OpenAI API parameter support
- Integrate OpenAI options in config.py for automatic binding detection
- Update server functions to inject OpenAI options for openai/azure_openai bindings
2025-08-05 03:47:26 +08:00
yangdx
3099748668 Add temperature fallback for Ollama LLM binding
- Implement OLLAMA_LLM_TEMPERATURE env var
- Fallback to global TEMPERATURE if unset
- Remove redundant OllamaLLMOptions logic
- Update env.example with new setting
2025-08-05 01:50:09 +08:00
yangdx
e5e3f0f878 Fix(Ollama option): change stop option from string to list and add fallback global temperature setting 2025-08-04 19:43:14 +08:00
yangdx
f8a880ac66 Improved binding options testing and documentation 2025-08-04 18:21:55 +08:00
yangdx
7b3a9c09ca Fix: add missing colume to LLM cache of PostgreSQL implementation 2025-08-04 11:12:59 +08:00
yangdx
223f4cdf62 Mark deprecated fields with TODO comments 2025-08-04 11:11:57 +08:00
yangdx
3bac19df5a Bump core version to 1.4.7 and api version to 0198 2025-08-04 10:55:41 +08:00
yangdx
63496698a1 Fix: ensure data migration is handled by single-process
- Wrap migration logic with get_data_init_lock() to ensure single-process execution
- Prevent race conditions when multiple processes start simultaneously
2025-08-04 01:47:20 +08:00
yangdx
e04d8ed8a7 Improved storage drop logging with namespace details
- Added namespace and workspace to drop logs
2025-08-04 00:56:39 +08:00
yangdx
5513155808 Fix namespace tablename translate error
- Reorder namespace table map for PostgreSQL
- Ensure specific namespaces come first
2025-08-04 00:21:20 +08:00
yangdx
daf2633dc2 Bump api version to 0198 2025-08-03 23:04:42 +08:00