174 Commits

Author SHA1 Message Date
yangdx
cb75e6631e Remove quantized embedding info from LLM cache
- Delete quantize_embedding function
- Delete dequantize_embedding function
- Remove embedding fields from CacheData
- Update save_to_cache to exclude embedding data
- Clean up unused quantization-related code
2025-08-05 17:58:34 +08:00
yangdx
32af45ff46 refactor: improve JSON parsing reliability with json-repair library
Replace regex-based JSON extraction with json-repair for better handling of malformed LLM responses. Remove deprecated JSON parsing utilities and clean up keyword_extraction parameter across LLM providers.

- Remove locate_json_string_body_from_string() and convert_response_to_json()
- Use json-repair.loads() in extract_keywords_only() for robust parsing
- Clean up LLM interfaces and remove unused parameters
- Add json-repair dependency
2025-08-01 19:36:20 +08:00
yangdx
2af8a93dc7 fix: resolve _sort_key error in Redis get_docs_paginated function 2025-07-31 02:16:56 +08:00
yangdx
d0bc5e7c4a Extend path filter to also cover POST requests 2025-07-31 02:06:56 +08:00
yangdx
3e5efd0b27 Add /documents/paginated to filtered logging paths 2025-07-31 02:00:00 +08:00
yangdx
6014b9bf73 feat: add track_id support for document processing progress monitoring
- Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON)
- Implement automatic track_id generation with upload_/insert_ prefixes
- Add /track_status/{track_id} API endpoint for frontend progress queries
- Create database indexes for efficient track_id lookups
- Enable real-time document processing status tracking across all storage types
2025-07-29 22:24:21 +08:00
yangdx
9923821d75 refactor: Remove deprecated max_token_size from embedding configuration
This parameter is no longer used. Its removal simplifies the API and clarifies that token length management is handled by upstream text chunking logic rather than the embedding wrapper.
2025-07-29 10:49:35 +08:00
yangdx
e09929b42e Refine rerank filtering log message for clarity 2025-07-27 16:57:38 +08:00
yangdx
f4bca7bfb2 Fix linting 2025-07-27 16:50:45 +08:00
yangdx
a9565d7379 feat: Skip rerank filtering when min_rerank_score is 0.0 2025-07-27 16:50:12 +08:00
yangdx
ebaff228aa feat: Add rerank score filtering with configurable threshold
- Add DEFAULT_MIN_RERANK_SCORE constant (default: 0.0)
- Add MIN_RERANK_SCORE environment variable support
- Filter chunks with rerank scores below threshold in process_chunks_unified
- Add info-level logging for filtering operations
- Handle empty results gracefully after filtering
- Maintain backward compatibility with non-reranked chunks
2025-07-27 16:37:44 +08:00
yangdx
a67f93acc9 Replace hardcoded max tokens with DEFAULT_MAX_TOTAL_TOKENS constant
- Use constant in process_chunks_unified
- Update WebUI default to match (32000)
2025-07-26 11:23:54 +08:00
yangdx
7b915b34f6 Refactor: move build_file_path function from operate.py to utils.py 2025-07-26 10:52:59 +08:00
yangdx
d78fda1d89 Optimize logger message 2025-07-24 04:31:06 +08:00
yangdx
d97913873b Update logger message 2025-07-24 03:44:02 +08:00
yangdx
3075691f72 Refactor: move reranking utilities from operate.py to utils.py
• Move apply_rerank_if_enabled to utils
• Move process_chunks_unified to utils
2025-07-24 03:33:38 +08:00
yangdx
5a5d32dc32 Optimize logger message 2025-07-24 02:13:39 +08:00
yangdx
02f79508e0 Optimize context building with weighted polling and round-robin data selection 2025-07-24 01:18:21 +08:00
zrguo
1541034816 Add DEFAULT_RELATED_CHUNK_NUMBER 2025-07-15 21:35:12 +08:00
SLKun
5f330ec11a remove <think> tag for entities and keywords extraction 2025-07-08 14:59:15 +08:00
yangdx
e56734cb8b Refac: Optimize document deletion performance
- Adding chunks_list to  dock_status
- Adding  llm_cache_list to text_chunks
- Implemented storage types: JsonKV and  Redis
2025-07-03 04:18:25 +08:00
yangdx
271722405f feat: Flatten LLM cache structure for improved recall efficiency
Refactored the LLM cache to a flat Key-Value (KV) structure, replacing the previous nested format. The old structure used the 'mode' as a key and stored specific cache content as JSON nested under it. This change significantly enhances cache recall efficiency.
2025-07-02 16:11:53 +08:00
zrguo
ead82a8dbd update delete_by_doc_id 2025-06-09 18:52:34 +08:00
yangdx
38b862e993 Remove unsed functions 2025-05-18 07:16:52 +08:00
sa9arr
36b606d0db Fix: Correct GraphML to JSON mapping in xml_to_json function 2025-05-17 19:32:25 +05:45
yangdx
2845e268e4 Ensure priority_limit_async_func_call decorator receive callable 2025-05-13 02:00:01 +08:00
yangdx
4d57370c94 Refactor: Move get_env_value from api.config to utils
Relocates the `get_env_value` utility function
from `lightrag.api.config` to `lightrag.utils` to decouple
LightRAG core from API Server
2025-05-10 08:58:18 +08:00
yangdx
3eb3b170ab Remove list_of_list_to_dict function 2025-05-07 18:01:23 +08:00
yangdx
156244e260 Refactor: Unify naive context to JSON format
- Merges 'mix' mode query handling into 'hybrid' mode, simplifying query logic by removing the dedicated `mix_kg_vector_query` function
- Standardizes vector search result by using JSON string format to build context
- Fixes a bug in `query_with_keywords` ensuring `hl_keywords` and `ll_keywords` are correctly passed to `kg_query_with_keywords`
2025-05-07 17:42:14 +08:00
yangdx
3146309fde Change function name from list_of_list_to_json to list_of_list_to_dict 2025-05-07 10:52:26 +08:00
yangdx
dbfcf30801 Fix linting 2025-05-06 22:03:40 +08:00
yangdx
c8ecfa2d68 feat: Centralize configuration and update defaults
This commit introduces `lightrag/constants.py` to centralize default values for various configurations across the API and core components.

Key changes:
- Added `constants.py` to centralize default values
- Improved the `get_env_value` function in `api/config.py` to correctly handle string "None" as a None value and to catch `TypeError` during value conversion.
- Updated the default `SUMMARY_LANGUAGE` to "English"
- Set default `WORKERS` to 2
2025-05-06 22:00:43 +08:00
yangdx
a36abce8d6 Update commnents 2025-05-05 11:26:31 +08:00
yangdx
62fd4a0540 Optimize log messages 2025-04-30 13:53:03 +08:00
yangdx
81953e6d46 Enhance the robustness of concurrency control and scheduling logic 2025-04-29 13:38:11 +08:00
yangdx
1afcbcbfb5 Fix race condition for health_check and ensure_workers 2025-04-29 00:08:52 +08:00
yangdx
1fc26127d5 Fix linting 2025-04-28 23:21:34 +08:00
yangdx
0ecae90002 Enhance the function's robustness 2025-04-28 22:52:31 +08:00
yangdx
e30afe8686 fix(utils): Fix TypeError in priority_limit_async_func_call when comparing Future objects 2025-04-28 21:07:01 +08:00
yangdx
2d59ac1ecb Remove deprecated embedding cache logic 2025-04-28 18:51:43 +08:00
yangdx
5a393e563e remove duplicate priority setting for merge summerization 2025-04-28 18:37:51 +08:00
yangdx
140b1b3cbb Add priority control for limited async decorator 2025-04-28 18:12:29 +08:00
yangdx
02e9055f9d Fix linting 2025-04-24 20:04:42 +08:00
yangdx
f6129857a1 Improve quantize and dequantize handling of embedding 2025-04-24 20:03:01 +08:00
yangdx
6977db3dd1 Remove the single quotation marks that enclose the names of the entities 2025-04-23 21:30:07 +08:00
yangdx
21c0bb7abf Merge branch 'context_format_csv_to_json' 2025-04-22 12:25:50 +08:00
yangdx
e7063b5f1e Remove embedding_cache_config 2025-04-22 00:28:17 +08:00
yangdx
85684164f0 Fix linting 2025-04-21 20:18:05 +08:00
yangdx
17f5439952 Remove space between chinese chars and Egnlish symbols 2025-04-21 19:21:30 +08:00
孟超
8064a2339f change process_combine_contexts params type to list[dict[str, str]] 2025-04-21 12:08:12 +08:00