yangdx
cb75e6631e
Remove quantized embedding info from LLM cache
...
- Delete quantize_embedding function
- Delete dequantize_embedding function
- Remove embedding fields from CacheData
- Update save_to_cache to exclude embedding data
- Clean up unused quantization-related code
2025-08-05 17:58:34 +08:00
yangdx
32af45ff46
refactor: improve JSON parsing reliability with json-repair library
...
Replace regex-based JSON extraction with json-repair for better handling of malformed LLM responses. Remove deprecated JSON parsing utilities and clean up keyword_extraction parameter across LLM providers.
- Remove locate_json_string_body_from_string() and convert_response_to_json()
- Use json-repair.loads() in extract_keywords_only() for robust parsing
- Clean up LLM interfaces and remove unused parameters
- Add json-repair dependency
2025-08-01 19:36:20 +08:00
yangdx
2af8a93dc7
fix: resolve _sort_key error in Redis get_docs_paginated function
2025-07-31 02:16:56 +08:00
yangdx
d0bc5e7c4a
Extend path filter to also cover POST requests
2025-07-31 02:06:56 +08:00
yangdx
3e5efd0b27
Add /documents/paginated to filtered logging paths
2025-07-31 02:00:00 +08:00
yangdx
6014b9bf73
feat: add track_id support for document processing progress monitoring
...
- Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON)
- Implement automatic track_id generation with upload_/insert_ prefixes
- Add /track_status/{track_id} API endpoint for frontend progress queries
- Create database indexes for efficient track_id lookups
- Enable real-time document processing status tracking across all storage types
2025-07-29 22:24:21 +08:00
yangdx
9923821d75
refactor: Remove deprecated max_token_size from embedding configuration
...
This parameter is no longer used. Its removal simplifies the API and clarifies that token length management is handled by upstream text chunking logic rather than the embedding wrapper.
2025-07-29 10:49:35 +08:00
yangdx
e09929b42e
Refine rerank filtering log message for clarity
2025-07-27 16:57:38 +08:00
yangdx
f4bca7bfb2
Fix linting
2025-07-27 16:50:45 +08:00
yangdx
a9565d7379
feat: Skip rerank filtering when min_rerank_score is 0.0
2025-07-27 16:50:12 +08:00
yangdx
ebaff228aa
feat: Add rerank score filtering with configurable threshold
...
- Add DEFAULT_MIN_RERANK_SCORE constant (default: 0.0)
- Add MIN_RERANK_SCORE environment variable support
- Filter chunks with rerank scores below threshold in process_chunks_unified
- Add info-level logging for filtering operations
- Handle empty results gracefully after filtering
- Maintain backward compatibility with non-reranked chunks
2025-07-27 16:37:44 +08:00
yangdx
a67f93acc9
Replace hardcoded max tokens with DEFAULT_MAX_TOTAL_TOKENS constant
...
- Use constant in process_chunks_unified
- Update WebUI default to match (32000)
2025-07-26 11:23:54 +08:00
yangdx
7b915b34f6
Refactor: move build_file_path function from operate.py to utils.py
2025-07-26 10:52:59 +08:00
yangdx
d78fda1d89
Optimize logger message
2025-07-24 04:31:06 +08:00
yangdx
d97913873b
Update logger message
2025-07-24 03:44:02 +08:00
yangdx
3075691f72
Refactor: move reranking utilities from operate.py to utils.py
...
• Move apply_rerank_if_enabled to utils
• Move process_chunks_unified to utils
2025-07-24 03:33:38 +08:00
yangdx
5a5d32dc32
Optimize logger message
2025-07-24 02:13:39 +08:00
yangdx
02f79508e0
Optimize context building with weighted polling and round-robin data selection
2025-07-24 01:18:21 +08:00
zrguo
1541034816
Add DEFAULT_RELATED_CHUNK_NUMBER
2025-07-15 21:35:12 +08:00
SLKun
5f330ec11a
remove <think> tag for entities and keywords extraction
2025-07-08 14:59:15 +08:00
yangdx
e56734cb8b
Refac: Optimize document deletion performance
...
- Adding chunks_list to dock_status
- Adding llm_cache_list to text_chunks
- Implemented storage types: JsonKV and Redis
2025-07-03 04:18:25 +08:00
yangdx
271722405f
feat: Flatten LLM cache structure for improved recall efficiency
...
Refactored the LLM cache to a flat Key-Value (KV) structure, replacing the previous nested format. The old structure used the 'mode' as a key and stored specific cache content as JSON nested under it. This change significantly enhances cache recall efficiency.
2025-07-02 16:11:53 +08:00
zrguo
ead82a8dbd
update delete_by_doc_id
2025-06-09 18:52:34 +08:00
yangdx
38b862e993
Remove unsed functions
2025-05-18 07:16:52 +08:00
sa9arr
36b606d0db
Fix: Correct GraphML to JSON mapping in xml_to_json function
2025-05-17 19:32:25 +05:45
yangdx
2845e268e4
Ensure priority_limit_async_func_call decorator receive callable
2025-05-13 02:00:01 +08:00
yangdx
4d57370c94
Refactor: Move get_env_value from api.config to utils
...
Relocates the `get_env_value` utility function
from `lightrag.api.config` to `lightrag.utils` to decouple
LightRAG core from API Server
2025-05-10 08:58:18 +08:00
yangdx
3eb3b170ab
Remove list_of_list_to_dict function
2025-05-07 18:01:23 +08:00
yangdx
156244e260
Refactor: Unify naive context to JSON format
...
- Merges 'mix' mode query handling into 'hybrid' mode, simplifying query logic by removing the dedicated `mix_kg_vector_query` function
- Standardizes vector search result by using JSON string format to build context
- Fixes a bug in `query_with_keywords` ensuring `hl_keywords` and `ll_keywords` are correctly passed to `kg_query_with_keywords`
2025-05-07 17:42:14 +08:00
yangdx
3146309fde
Change function name from list_of_list_to_json to list_of_list_to_dict
2025-05-07 10:52:26 +08:00
yangdx
dbfcf30801
Fix linting
2025-05-06 22:03:40 +08:00
yangdx
c8ecfa2d68
feat: Centralize configuration and update defaults
...
This commit introduces `lightrag/constants.py` to centralize default values for various configurations across the API and core components.
Key changes:
- Added `constants.py` to centralize default values
- Improved the `get_env_value` function in `api/config.py` to correctly handle string "None" as a None value and to catch `TypeError` during value conversion.
- Updated the default `SUMMARY_LANGUAGE` to "English"
- Set default `WORKERS` to 2
2025-05-06 22:00:43 +08:00
yangdx
a36abce8d6
Update commnents
2025-05-05 11:26:31 +08:00
yangdx
62fd4a0540
Optimize log messages
2025-04-30 13:53:03 +08:00
yangdx
81953e6d46
Enhance the robustness of concurrency control and scheduling logic
2025-04-29 13:38:11 +08:00
yangdx
1afcbcbfb5
Fix race condition for health_check and ensure_workers
2025-04-29 00:08:52 +08:00
yangdx
1fc26127d5
Fix linting
2025-04-28 23:21:34 +08:00
yangdx
0ecae90002
Enhance the function's robustness
2025-04-28 22:52:31 +08:00
yangdx
e30afe8686
fix(utils): Fix TypeError in priority_limit_async_func_call when comparing Future objects
2025-04-28 21:07:01 +08:00
yangdx
2d59ac1ecb
Remove deprecated embedding cache logic
2025-04-28 18:51:43 +08:00
yangdx
5a393e563e
remove duplicate priority setting for merge summerization
2025-04-28 18:37:51 +08:00
yangdx
140b1b3cbb
Add priority control for limited async decorator
2025-04-28 18:12:29 +08:00
yangdx
02e9055f9d
Fix linting
2025-04-24 20:04:42 +08:00
yangdx
f6129857a1
Improve quantize and dequantize handling of embedding
2025-04-24 20:03:01 +08:00
yangdx
6977db3dd1
Remove the single quotation marks that enclose the names of the entities
2025-04-23 21:30:07 +08:00
yangdx
21c0bb7abf
Merge branch 'context_format_csv_to_json'
2025-04-22 12:25:50 +08:00
yangdx
e7063b5f1e
Remove embedding_cache_config
2025-04-22 00:28:17 +08:00
yangdx
85684164f0
Fix linting
2025-04-21 20:18:05 +08:00
yangdx
17f5439952
Remove space between chinese chars and Egnlish symbols
2025-04-21 19:21:30 +08:00
孟超
8064a2339f
change process_combine_contexts params type to list[dict[str, str]]
2025-04-21 12:08:12 +08:00