LightRAG

mirror of https://github.com/HKUDS/LightRAG.git synced 2025-11-22 13:06:10 +00:00

Author	SHA1	Message	Date
yangdx	ff0a18e08c	Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method	2025-08-27 12:23:22 +08:00
Thibo Rosemplatt	c3aabfc251	Merge branch 'main' into entityTypesServerSupport	2025-08-26 21:48:20 +02:00
yangdx	d3623cc9ae	fix: resolve infinite loop risk in _handle_entity_relation_summary - Ensure oversized descriptions are force-merged with subsequent ones - Add len(current_list) <= 2 termination condition to guarantee convergence - Implement token-based truncation in _summarize_descriptions to prevent overflow	2025-08-26 21:58:31 +08:00
yangdx	6bcfe696ee	feat: add output length recommendation and description type to LLM summary - Add SUMMARY_LENGTH_RECOMMENDED parameter (600 tokens) - Optimize prompt temple for LLM summary	2025-08-26 14:41:12 +08:00
yangdx	cb0fe38b9a	Fix linting	2025-08-26 02:22:34 +08:00
yangdx	de2daf6565	refac: Rename summary_max_tokens to summary_context_size, comprehensive parameter validation for summary configuration - Update algorithm logic in operate.py for better token management - Fix health endpoint to use correct parameter names	2025-08-26 01:35:50 +08:00
yangdx	0b1b264a5d	refactor: optimize graph lock scope in document deletion - Move dependency analysis outside graph database lock - Add persistence call before lock release to prevent dirty reads	2025-08-25 17:46:32 +08:00
Thibo Rosemplatt	d054ec5d00	Added entity_types as a user defined variable (via .env)	2025-08-23 20:16:11 +02:00
yangdx	bf43e1b8c1	fix: Resolve default rerank config problem when env var missing - Read config from selected_rerank_func when env var missing - Make api_key optional for rerank function - Add response format validation with proper error handling - Update Cohere rerank default to official API endpoint	2025-08-23 01:07:59 +08:00
yangdx	0e67ead8fa	Rename MAX_TOKENS to SUMMARY_MAX_TOKENS for clarity	2025-08-21 10:15:20 +08:00
yangdx	9b7ed84e05	Improve document deletion error handling and message consistency - Standardize deletion log messages - Add try-catch for file operations - Improve enqueued file error handling	2025-08-20 11:01:24 +08:00
yangdx	485c4b7de7	Change document deletion warnings to info level logging	2025-08-20 03:28:42 +08:00
yangdx	806081645f	Refactor text cleaning to use sanitize_text_for_encoding consistently • Replace clean_text with sanitize_text • Remove deprecated clean_text function • Add whitespace trimming to sanitizer • Improve UTF-8 encoding safety • Consolidate text cleaning logic	2025-08-19 19:20:01 +08:00
yangdx	e38df464ea	Ensure front-end file type uploads are synchronized with back-end	2025-08-19 15:10:13 +08:00
yangdx	1c4d6fde58	Change log level from info to debug for document storage message	2025-08-18 20:04:29 +08:00
yangdx	377f1a022e	fix: reset PROCESSING/FAILED docs to PENDING at the beginging of document processing pipeline - Reset documents with PROCESSING/FAILED status to PENDING when they pass consistency checks - Update doc_status storage and clear error messages/metadata on reset	2025-08-18 00:49:52 +08:00
yangdx	add8b07a21	Improve logging messages for document processing clarity	2025-08-18 00:22:04 +08:00
yangdx	1941df9cf6	Simplify warning message format for document deletion	2025-08-17 13:30:55 +08:00
yangdx	3e4214cef3	Standardize document deletion warning messages for consistency	2025-08-17 09:35:46 +08:00
yangdx	cceb46b320	fix: subdirectories are no longer processed during file scans • Change rglob to glob for file scanning • Simplify error logging messages	2025-08-16 23:46:33 +08:00
yangdx	f5b0c3d38c	feat: Recording file extraction error status to document pipeline - Add apipeline_enqueue_error_documents function to LightRAG class for recording file processing errors in doc_status storage - Enhance pipeline_enqueue_file with detailed error handling for all file processing stages: * File access errors (permissions, not found) * UTF-8 encoding errors * Format-specific processing errors (PDF, DOCX, PPTX, XLSX) * Content validation errors * Unsupported file type errors This implementation ensures all file extraction failures are properly tracked and recorded in the doc_status storage system, providing better visibility into document processing issues and enabling improved error monitoring and debugging capabilities.	2025-08-16 23:08:52 +08:00
yangdx	ca4c18baaa	Preserve failed documents during data consistency validation for manual review	2025-08-16 22:29:46 +08:00
yangdx	e1310c5262	Optimize document processing pipeline by removing duplicate step	2025-08-16 17:23:01 +08:00
yangdx	5591ef3ac8	Fix document filtering logic and improve logging for ignored docs	2025-08-16 17:22:08 +08:00
yangdx	5c7ae8721b	Merge branch 'main' into pick-trunk-by-vector	2025-08-14 13:11:14 +08:00
yangdx	3bba5fc506	Fix linting	2025-08-14 13:03:23 +08:00
yangdx	65a4437f78	Fix: Persist document data immediately after index update	2025-08-14 12:33:36 +08:00
yangdx	28fc075c59	Simplify inconsistency logging and cleanup messages	2025-08-14 11:49:58 +08:00
yangdx	17faeb2fb8	refactor: integrate document consistency validation into pipeline processing This ensures data consistency validation is part of the main processing pipeline and provides better monitoring of inconsistent document cleanup operations.	2025-08-14 11:38:36 +08:00
yangdx	a3f7bc5b7e	Merge branch 'main' into pick-trunk-by-vector	2025-08-14 06:19:57 +08:00
yangdx	b5ae84fac6	fix: Add data consistency validation to document processing pipeline - Add _validate_and_fix_document_consistency() method to detect and fix documents with missing content in full_docs storage - Integrate consistency check into apipeline_process_enqueue_documents() to automatically mark inconsistent documents as FAILED before processing - Prevent processing errors caused by documents having status records but missing actual content data	2025-08-14 06:18:34 +08:00
yangdx	f1dafa0d01	feat: KG related chunks selection by vector similarity - Add env switch to toggle weighted polling vs vector-similarity strategy - Implement similarity-based sorting with fallback to weighted - Introduce batch vector read API for vector storage - Implement vector store and retrive funtion for Nanovector DB - Preserve default behavior (weighted polling selection method)	2025-08-13 18:16:42 +08:00
yangdx	0b2c3d06c7	- Remove redundant collection listing check	2025-08-12 15:24:06 +08:00
yangdx	fc8ca1a706	Fix: add muti-process lock for initialize and drop method for all storage	2025-08-12 04:25:09 +08:00
yangdx	44204abef7	Fix linting	2025-08-10 10:59:32 +08:00
yangdx	eb2320e556	Fix: Initialize first_stage_tasks and entity_relation_task to prevent empty-task cancel errors - Initialize first_stage_tasks = [] and entity_relation_task = None at coroutine start - Ensure cancel block safely handles no-op when tasks lists are empty	2025-08-10 10:45:41 +08:00
yangdx	cf064579ce	Remove deprecated keyword extraction query methods - Delete query_with_keywords function - Remove kg_query_with_keywords helper - Drop separate keyword extraction methods	2025-08-08 14:59:39 +08:00
yangdx	c22315ea6d	refactor: remove selective LLM cache clearing functionality - Remove optional 'modes' parameter from aclear_cache() and clear_cache() methods - Replace deprecated drop_cache_by_modes() with drop() method for complete cache clearing - Update API endpoint to ignore mode-specific parameters and clear all cache - Simplify frontend clearCache() function to send empty request body This change ensures all LLM cache is cleared together.	2025-08-05 23:51:51 +08:00
yangdx	01bce8c26e	feat: add warning logs for deleting non-completed documents	2025-08-05 12:21:08 +08:00
yangdx	63496698a1	Fix: ensure data migration is handled by single-process - Wrap migration logic with get_data_init_lock() to ensure single-process execution - Prevent race conditions when multiple processes start simultaneously	2025-08-04 01:47:20 +08:00
yangdx	bf9a6d699b	Fix(lightrag): Handle undirected edges in data migration The `_migrate_entity_relation_data` function previously processed directed edges from `get_all_edges`, which could lead to duplicates (e.g., (A,B) and (B,A)) and an incorrect relation count. This commit normalizes edges by sorting their source and target nodes before adding them to the relation set. This ensures all edges are treated as undirected and are properly deduplicated.	2025-08-03 22:14:24 +08:00
yangdx	e8d8afa846	Removed auto storage management from LightRAG instance creation - The `initialize_storages` method must be explicitly called after LightRAG creation. The `finalize_storages` method should be called before LightRAG destyoyed. - Added explicit data migration check	2025-08-03 12:42:57 +08:00
yangdx	06efab4af2	Revert "Remove auto_manage_storages_states option" This reverts commit bfe6657b316f7e50bc9c5f0cc71d9fbb2b605ddd.	2025-08-03 12:12:13 +08:00
yangdx	bfe6657b31	Remove auto_manage_storages_states option - Always manage storage states by LightRAG - Remove rag.initialize_storages() from all examples	2025-08-03 10:29:36 +08:00
yangdx	091f2b42c3	feat(performance): Optimize document deletion with entity/relation index - Introduces an index mapping documents to their corresponding entities and relations. This significantly speeds up `adelete_by_doc_id` by replacing slow graph traversal with a fast key-value lookup. - Refactors the ingestion pipeline (`merge_nodes_and_edges`) to populate this new index. Adds a one-time data migration script to backfill the index for existing data.	2025-08-03 09:19:02 +08:00
yangdx	32af45ff46	refactor: improve JSON parsing reliability with json-repair library Replace regex-based JSON extraction with json-repair for better handling of malformed LLM responses. Remove deprecated JSON parsing utilities and clean up keyword_extraction parameter across LLM providers. - Remove locate_json_string_body_from_string() and convert_response_to_json() - Use json-repair.loads() in extract_keywords_only() for robust parsing - Clean up LLM interfaces and remove unused parameters - Add json-repair dependency	2025-08-01 19:36:20 +08:00
yangdx	8271e1f6f1	Move OllamaServerInfos class to base module - Eliminate dependency of the core module on the API module.	2025-07-31 23:24:49 +08:00
yangdx	9d5603d35e	Set the default LLM temperature to 1.0 and centralize constant management	2025-07-31 17:15:10 +08:00
yangdx	c7bc4fc42c	Add track_id return to document processing pipeline	2025-07-30 10:27:12 +08:00
yangdx	cbaede8455	Add ScanResponse type for scan endpoint in webui	2025-07-30 03:11:09 +08:00

1 2 3 4 5 ...

561 Commits