LightRAG

mirror of https://github.com/HKUDS/LightRAG.git synced 2025-11-22 13:06:10 +00:00

Author	SHA1	Message	Date
yangdx	bf43e1b8c1	fix: Resolve default rerank config problem when env var missing - Read config from selected_rerank_func when env var missing - Make api_key optional for rerank function - Add response format validation with proper error handling - Update Cohere rerank default to official API endpoint	2025-08-23 01:07:59 +08:00
yangdx	580cb7906c	feat: Add multiple rerank provider support to LightRAG Server by adding new env vars and cli params - Add --enable-rerank CLI argument and ENABLE_RERANK env var - Simplify rerank configuration logic to only check enable flag and binding - Update health endpoint to show enable_rerank and rerank_configured status - Improve logging messages for rerank enable/disable states - Maintain backward compatibility with default value True	2025-08-22 19:29:45 +08:00
yangdx	b5c230abdd	optimize: avoid duplicate embedding calls in _build_query_context Reduces API costs and improves query performance while maintaining backward compatibility.	2025-08-21 16:49:24 +08:00
yangdx	ced3aef7cb	refactor: simplify text encoding by removing redundant safe_encode_for_llm	2025-08-19 19:37:46 +08:00
yangdx	806081645f	Refactor text cleaning to use sanitize_text_for_encoding consistently • Replace clean_text with sanitize_text • Remove deprecated clean_text function • Add whitespace trimming to sanitizer • Improve UTF-8 encoding safety • Consolidate text cleaning logic	2025-08-19 19:20:01 +08:00
yangdx	f9cf544805	Add text sanitization to prevent UTF-8 encoding errors in LLM calls • Remove surrogate characters • Clean control characters • Sanitize input and history messages • Add comprehensive error handling • Log sanitization activities	2025-08-19 18:50:52 +08:00
yangdx	64015548df	Refactor MD5 hash functions and consolidate Unicode error handling	2025-08-19 17:49:23 +08:00
yangdx	64058c771f	Refactor: Harden `compute_args_hash` against Unicode errors	2025-08-19 17:19:39 +08:00
yangdx	d3fde60938	refactor: remove file_path and created_at from context, improve token truncation - Remove file_path and created_at fields from entity and relationship contexts - Update token truncation to include full JSON serialization instead of content only	2025-08-18 18:30:09 +08:00
yangdx	453efeb924	Fix file path length checking to use UTF-8 byte length instead of char count	2025-08-18 13:59:27 +08:00
yangdx	14e083a1a6	fix: replace pyuca with pypinyin for Chinese pinyin sorting and add file_path sort	2025-08-17 15:21:24 +08:00
yangdx	61469c0a56	Add Chinese pinyin sorting support across document operations • Replace pyuca with centralized utils function • Add pinyin sort keys for file paths • Update MongoDB indexes with zh collation • Migrate existing indexes for compatibility • Support Chinese chars in Redis/JSON storage • Keep PostgreSQL sorting order controled by Database Collate order	2025-08-17 12:45:48 +08:00
yangdx	4a19d0de25	Add chunk tracking system to monitor chunk sources and frequencies • Track chunk sources (E/R/C types) • Log frequency and order metadata • Preserve chunk_id through processing • Add debug logging for chunk tracking • Handle rerank and truncation operations	2025-08-14 22:58:26 +08:00
yangdx	a8b7890470	Rename chunk selection functions for better clarity	2025-08-14 16:01:13 +08:00
yangdx	a11e8d77eb	Improve missing-vector warning logic in vector similarity - Check for any missing vectors - Separate no-vector vs partial-vector warnings - Ensure early return on empty vectors	2025-08-14 14:24:15 +08:00
yangdx	2e5487305e	Merge branch 'main' into pick-trunk-by-vector	2025-08-14 03:12:38 +08:00
yangdx	7fb11193b0	Fix linting	2025-08-14 03:07:29 +08:00
yangdx	331dcf0509	Remove query params from cache key generation for keyword extration	2025-08-14 02:57:39 +08:00
yangdx	3343833571	Remove query params from cache key generation for keyword extration	2025-08-14 02:36:01 +08:00
yangdx	f1dafa0d01	feat: KG related chunks selection by vector similarity - Add env switch to toggle weighted polling vs vector-similarity strategy - Implement similarity-based sorting with fallback to weighted - Introduce batch vector read API for vector storage - Implement vector store and retrive funtion for Nanovector DB - Preserve default behavior (weighted polling selection method)	2025-08-13 18:16:42 +08:00
zrguo	f1c7233763	Avoid UTF-8 BOM	2025-08-12 17:06:54 +08:00
yangdx	0463963520	fix: include all query parameters in LLM cache hash key generation - Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions - Add queryparam field to CacheData structure and PostgreSQL storage for debugging - Update PostgreSQL schema with automatic migration for queryparam JSONB column - Prevent incorrect cache hits between queries with different parameters Fixes issue where different query parameters incorrectly shared the same cached results.	2025-08-05 18:03:10 +08:00
yangdx	cb75e6631e	Remove quantized embedding info from LLM cache - Delete quantize_embedding function - Delete dequantize_embedding function - Remove embedding fields from CacheData - Update save_to_cache to exclude embedding data - Clean up unused quantization-related code	2025-08-05 17:58:34 +08:00
yangdx	32af45ff46	refactor: improve JSON parsing reliability with json-repair library Replace regex-based JSON extraction with json-repair for better handling of malformed LLM responses. Remove deprecated JSON parsing utilities and clean up keyword_extraction parameter across LLM providers. - Remove locate_json_string_body_from_string() and convert_response_to_json() - Use json-repair.loads() in extract_keywords_only() for robust parsing - Clean up LLM interfaces and remove unused parameters - Add json-repair dependency	2025-08-01 19:36:20 +08:00
yangdx	2af8a93dc7	fix: resolve _sort_key error in Redis get_docs_paginated function	2025-07-31 02:16:56 +08:00
yangdx	d0bc5e7c4a	Extend path filter to also cover POST requests	2025-07-31 02:06:56 +08:00
yangdx	3e5efd0b27	Add /documents/paginated to filtered logging paths	2025-07-31 02:00:00 +08:00
yangdx	6014b9bf73	feat: add track_id support for document processing progress monitoring - Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON) - Implement automatic track_id generation with upload_/insert_ prefixes - Add /track_status/{track_id} API endpoint for frontend progress queries - Create database indexes for efficient track_id lookups - Enable real-time document processing status tracking across all storage types	2025-07-29 22:24:21 +08:00
yangdx	9923821d75	refactor: Remove deprecated `max_token_size` from embedding configuration This parameter is no longer used. Its removal simplifies the API and clarifies that token length management is handled by upstream text chunking logic rather than the embedding wrapper.	2025-07-29 10:49:35 +08:00
yangdx	e09929b42e	Refine rerank filtering log message for clarity	2025-07-27 16:57:38 +08:00
yangdx	f4bca7bfb2	Fix linting	2025-07-27 16:50:45 +08:00
yangdx	a9565d7379	feat: Skip rerank filtering when `min_rerank_score` is 0.0	2025-07-27 16:50:12 +08:00
yangdx	ebaff228aa	feat: Add rerank score filtering with configurable threshold - Add DEFAULT_MIN_RERANK_SCORE constant (default: 0.0) - Add MIN_RERANK_SCORE environment variable support - Filter chunks with rerank scores below threshold in process_chunks_unified - Add info-level logging for filtering operations - Handle empty results gracefully after filtering - Maintain backward compatibility with non-reranked chunks	2025-07-27 16:37:44 +08:00
yangdx	a67f93acc9	Replace hardcoded max tokens with DEFAULT_MAX_TOTAL_TOKENS constant - Use constant in process_chunks_unified - Update WebUI default to match (32000)	2025-07-26 11:23:54 +08:00
yangdx	7b915b34f6	Refactor: move build_file_path function from operate.py to utils.py	2025-07-26 10:52:59 +08:00
yangdx	d78fda1d89	Optimize logger message	2025-07-24 04:31:06 +08:00
yangdx	d97913873b	Update logger message	2025-07-24 03:44:02 +08:00
yangdx	3075691f72	Refactor: move reranking utilities from operate.py to utils.py • Move apply_rerank_if_enabled to utils • Move process_chunks_unified to utils	2025-07-24 03:33:38 +08:00
yangdx	5a5d32dc32	Optimize logger message	2025-07-24 02:13:39 +08:00
yangdx	02f79508e0	Optimize context building with weighted polling and round-robin data selection	2025-07-24 01:18:21 +08:00
zrguo	1541034816	Add DEFAULT_RELATED_CHUNK_NUMBER	2025-07-15 21:35:12 +08:00
SLKun	5f330ec11a	remove <think> tag for entities and keywords extraction	2025-07-08 14:59:15 +08:00
yangdx	e56734cb8b	Refac: Optimize document deletion performance - Adding chunks_list to dock_status - Adding llm_cache_list to text_chunks - Implemented storage types: JsonKV and Redis	2025-07-03 04:18:25 +08:00
yangdx	271722405f	feat: Flatten LLM cache structure for improved recall efficiency Refactored the LLM cache to a flat Key-Value (KV) structure, replacing the previous nested format. The old structure used the 'mode' as a key and stored specific cache content as JSON nested under it. This change significantly enhances cache recall efficiency.	2025-07-02 16:11:53 +08:00
zrguo	ead82a8dbd	update delete_by_doc_id	2025-06-09 18:52:34 +08:00
yangdx	38b862e993	Remove unsed functions	2025-05-18 07:16:52 +08:00
sa9arr	36b606d0db	Fix: Correct GraphML to JSON mapping in xml_to_json function	2025-05-17 19:32:25 +05:45
yangdx	2845e268e4	Ensure priority_limit_async_func_call decorator receive callable	2025-05-13 02:00:01 +08:00
yangdx	4d57370c94	Refactor: Move get_env_value from api.config to utils Relocates the `get_env_value` utility function from `lightrag.api.config` to `lightrag.utils` to decouple LightRAG core from API Server	2025-05-10 08:58:18 +08:00
yangdx	3eb3b170ab	Remove list_of_list_to_dict function	2025-05-07 18:01:23 +08:00

1 2 3 4

196 Commits