4753 Commits

Author SHA1 Message Date
yangdx
331dcf0509 Remove query params from cache key generation for keyword extration 2025-08-14 02:57:39 +08:00
yangdx
9a62101e9d Add OpenAI frequency penalty sample env params 2025-08-14 02:57:23 +08:00
yangdx
3343833571 Remove query params from cache key generation for keyword extration 2025-08-14 02:36:01 +08:00
yangdx
2a46667ac9 Add OpenAI frequency penalty sample env params 2025-08-14 01:50:27 +08:00
yangdx
bac09118d5 Simplify embedding func extraction 2025-08-14 01:09:18 +08:00
yangdx
ac3b5605a1 Refactor logging for relation chunk discovery with dedup info 2025-08-14 00:41:58 +08:00
yangdx
edac10906c fix: Add total_relation_chunks statistics and improve logging in _find_related_text_unit_from_relations 2025-08-13 23:45:31 +08:00
yangdx
5a40ff654e Change KG chunk selection default to VECTOR
- Set KG_CHUNK_PICK_METHOD default to VECTOR
- Update env.example with new config option
2025-08-13 23:10:42 +08:00
yangdx
947e826e61 Bump api version to 0200 2025-08-13 18:29:07 +08:00
yangdx
f1dafa0d01 feat: KG related chunks selection by vector similarity
- Add env switch to toggle weighted polling vs vector-similarity strategy
- Implement similarity-based sorting with fallback to weighted
- Introduce batch vector read API for vector storage
- Implement vector store and retrive funtion for Nanovector DB
- Preserve default behavior (weighted polling selection method)
2025-08-13 18:16:42 +08:00
SJ
99643f01de
Enhancement: support aws bedrock as an LLm binding #1733 2025-08-13 02:08:13 -05:00
Daniel.y
5b0e26d9da
Merge pull request #1941 from HKUDS/add-final-namespace
Fix: Resolve workspace isolation issues across multiple storage implementations
2025-08-12 20:17:53 +08:00
Daniel.y
203e420b51
Merge pull request #1931 from danielaskdd/fix-first-stage-tasks-missing
Fix: Initialize first_stage_tasks and entity_relation_task to prevent empty-task cancel errors
2025-08-12 19:19:04 +08:00
yangdx
578bdaa410 Pin pymilvus version to 2.5.2 to avoid Protobuf version warning 2025-08-12 18:22:00 +08:00
yangdx
5d1bc8b49d Relocate client creation to the initialize method to prevent race conditions in multi-process mode. 2025-08-12 18:20:56 +08:00
yangdx
74783d7781 Remove redundant debug logging for Qdrant operations 2025-08-12 17:29:05 +08:00
zrguo
f1c7233763 Avoid UTF-8 BOM 2025-08-12 17:06:54 +08:00
yangdx
41f8ef05b9 Restore thread safety to MongoDB client manager
- Protected client creation with lock
- Protected client release with lock
2025-08-12 16:42:53 +08:00
yangdx
0b2c3d06c7 - Remove redundant collection listing check 2025-08-12 15:24:06 +08:00
yangdx
fc8ca1a706 Fix: add muti-process lock for initialize and drop method for all storage 2025-08-12 04:25:09 +08:00
yangdx
ca00b9c8ee Fix: Resolve workspace isolation problem for PostgreSQL with multiple LightRAG instances 2025-08-12 01:27:05 +08:00
yangdx
d9c1f935f5 Fix: Resolve workspace isolation issues in in-memory database with multiple LightRAG instances 2025-08-12 01:26:09 +08:00
yangdx
095e0cbfa2 Refac: Add workspace infomation to all logger output for all storage type 2025-08-12 01:19:09 +08:00
yangdx
44204abef7 Fix linting 2025-08-10 10:59:32 +08:00
yangdx
eb2320e556 Fix: Initialize first_stage_tasks and entity_relation_task to prevent empty-task cancel errors
- Initialize first_stage_tasks = [] and entity_relation_task = None at coroutine start
- Ensure cancel block safely handles no-op when tasks lists are empty
2025-08-10 10:45:41 +08:00
Daniel.y
f1c6a4ed94
Merge pull request #1928 from danielaskdd/main
Fix: Update OpenAI embedding handling for both list and base64 embeddings
2025-08-09 08:44:21 +08:00
yangdx
ffb642a5ce Fix linting 2025-08-09 08:41:41 +08:00
yangdx
ecd7777e61 Update OpenAI embedding handling for both list and base64 embeddings
- Fix OpenAI embedding array parsing
- Improve embedding data type safety
2025-08-09 08:40:33 +08:00
yangdx
cf064579ce Remove deprecated keyword extraction query methods
- Delete query_with_keywords function
- Remove kg_query_with_keywords helper
- Drop separate keyword extraction methods
2025-08-08 14:59:39 +08:00
yangdx
f5ac6a9f4b Add default Ollama embedding context length
- Set default context length to 8192
- Overide the default context lenght for LLM in binding_options.py
2025-08-08 13:51:25 +08:00
yangdx
c2eefec707 Merge branch 'postgres-vector-index' 2025-08-08 03:01:34 +08:00
yangdx
16c9a81f4c feat: support config.ini for PostgreSQL vector index settings
- Add support for reading vector_index_type, hnsw_m, hnsw_ef, and ivfflat_lists from config.ini
- Maintain backward compatibility with environment variables
- Update config.ini.example with new PostgreSQL vector index options
- Follow existing configuration priority: env vars > config.ini > defaults
2025-08-08 02:55:49 +08:00
yangdx
dec4148075 Merge branch 'main' into Matt23-star/main 2025-08-08 02:24:34 +08:00
yangdx
f38e10559e Update PostgreSQL vector index configuration
- Remove FLAT index support
- Standardize on HNSW as default
- Add dimension validation
- Improve error logging
- Clean up index creation code
2025-08-08 02:21:06 +08:00
Daniel.y
2f289f6e25
Merge pull request #1924 from danielaskdd/neo4j-connection-lifetime
Refact:Enhanced Neo4j Connection Lifecycle Management
2025-08-08 01:16:42 +08:00
yangdx
f4ef254de2 fix(neo4j): enhance connection lifecycle management to prevent timeout errors
- Add max_connection_lifetime, liveness_check_timeout, keep_alive parameters
- Extend retry mechanisms for connection reset scenarios
- Update config examples with new Neo4j connection options
- Resolves ClientTimeoutException during data insertion operations
2025-08-08 01:07:45 +08:00
Daniel.y
c8a44f5657
Merge pull request #1923 from danielaskdd/fix-context-format
Fix: Unify  document chunks context format in only_need_context query
2025-08-08 00:05:26 +08:00
yangdx
eded6d1187 Unify document chunks context format in only_need_context query
- Update Document Chunks label to include (DC) abbreviation
2025-08-08 00:02:53 +08:00
Matt23-star
727ca43d3c feat: add vector index creation functionality for PostgreSQL 2025-08-07 23:07:18 +08:00
yangdx
7780776af6 Update env.example 2025-08-06 18:50:58 +08:00
Daniel.y
a6ef29cef6
Merge pull request #1915 from danielaskdd/optimize-llm-cache
Refact: Optimized LLM Cache Hash Key Generation by Including All Query Parameters
2025-08-06 01:04:02 +08:00
yangdx
2dab4e321d Bump api version to 0199 2025-08-06 01:03:35 +08:00
yangdx
a04c11a598 Remove deprecated storage 2025-08-06 00:02:50 +08:00
yangdx
c22315ea6d refactor: remove selective LLM cache clearing functionality
- Remove optional 'modes' parameter from aclear_cache() and clear_cache() methods
- Replace deprecated drop_cache_by_modes() with drop() method for complete cache clearing
- Update API endpoint to ignore mode-specific parameters and clear all cache
- Simplify frontend clearCache() function to send empty request body

This change ensures all LLM cache is cleared together.
2025-08-05 23:51:51 +08:00
yangdx
cc1f7118e7 Remove deprecated cache_by_modes functionality from all storage 2025-08-05 23:20:26 +08:00
yangdx
8294d6d1b7 Remove deprecated mode field from LLM cache schema
- Drop mode column from LLM cache table
- Update primary key to exclude mode
- Remove mode from all SQL queries
- Deprecate mode-related methods
- Update schema migration logic
2025-08-05 23:18:54 +08:00
yangdx
0b5c708660 Update storage implementation documentation
- Add detailed storage type descriptions
- Remove Chroma from vector storage options
- Include recommended PostgreSQL version
- Add Memgraph to graph storage options
- Update performance comparison notes
2025-08-05 18:03:51 +08:00
yangdx
0463963520 fix: include all query parameters in LLM cache hash key generation
- Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions
- Add queryparam field to CacheData structure and PostgreSQL storage for debugging
- Update PostgreSQL schema with automatic migration for queryparam JSONB column
- Prevent incorrect cache hits between queries with different parameters

Fixes issue where different query parameters incorrectly shared the same cached results.
2025-08-05 18:03:10 +08:00
yangdx
cb75e6631e Remove quantized embedding info from LLM cache
- Delete quantize_embedding function
- Delete dequantize_embedding function
- Remove embedding fields from CacheData
- Update save_to_cache to exclude embedding data
- Clean up unused quantization-related code
2025-08-05 17:58:34 +08:00
Daniel.y
c7d17f13c1
Merge pull request #1914 from danielaskdd/feat-tiktoken-cache
feat: add tiktoken cache directory support for offline deployment
2025-08-05 14:25:10 +08:00