LightRAG

mirror of https://github.com/HKUDS/LightRAG.git synced 2025-07-19 15:01:42 +00:00

Author	SHA1	Message	Date
yangdx	05bc5cfb64	Improve task execution with early failure detection - Add early failure detection for async tasks - Cancel pending tasks on first exception	2025-07-19 10:14:22 +08:00
yangdx	5f7cb437e8	Centralize query parameters into LightRAG class This commit refactors query parameter management by consolidating settings like `top_k`, token limits, and thresholds into the `LightRAG` class, and consistently sourcing parameters from a single location.	2025-07-15 23:56:49 +08:00
yangdx	47341d3a71	Merge branch 'main' into rerank	2025-07-15 16:12:33 +08:00
yangdx	ccc2a20071	feat: remove deprecated MAX_TOKEN_SUMMARY parameter to prevent LLM output truncation - Remove MAX_TOKEN_SUMMARY parameter and related configurations - Eliminate forced token-based truncation in entity/relationship descriptions - Switch to fragment-count based summarization logic using FORCE_LLM_SUMMARY_ON_MERGE - Update FORCE_LLM_SUMMARY_ON_MERGE default from 6 to 4 for better summarization - Clean up documentation, environment examples, and API display code - Preserve backward compatibility by graceful parameter removal This change resolves issues where LLMs were forcibly truncating entity relationship descriptions mid-sentence, leading to incomplete and potentially inaccurate knowledge graph content. The new approach allows LLMs to generate complete descriptions while still providing summarization when multiple fragments need to be merged. Breaking Change: None - parameter removal is backward compatible Fixes: Entity relationship description truncation issues	2025-07-15 12:26:33 +08:00
zrguo	7c882313bb	remove chunk_rerank_top_k	2025-07-15 11:52:34 +08:00
yangdx	b03bb48e24	feat: Refine summary logic and add dedicated Ollama num_ctx config - Refactor the trigger condition for LLM-based summarization of entities and relations. Instead of relying on character length, the summary is now triggered when the number of merged description fragments exceeds a configured threshold. This provides a more robust and logical condition for consolidation. - Introduce the `OLLAMA_NUM_CTX` environment variable to explicitly configure the context window size (`num_ctx`) for Ollama models. This decouples the model's context length from the `MAX_TOKENS` parameter, which is now specifically used to limit input for summary generation, making the configuration clearer and more flexible. - Updated `README` files, `env.example`, and default values to reflect these changes.	2025-07-14 01:55:04 +08:00
yangdx	03b40937f7	Reduce embedding concurrency limit from 16 to 8	2025-07-13 03:13:52 +08:00
yangdx	39965d7ded	Move merging stage back controled by max parallel insert semhore	2025-07-12 03:32:08 +08:00
yangdx	3afdd1b67c	Fix initial count error for multi-process lock with key	2025-07-11 20:39:08 +08:00
yangdx	c47747da9e	Merge branch 'main' into merge_lock_with_key	2025-07-11 16:37:10 +08:00
yangdx	ef4870fda5	Combined entity and edge processing tasks and optimize merging with semaphore	2025-07-11 16:34:54 +08:00
yangdx	9aa2ed0837	Merge branch 'main' into rerank	2025-07-09 15:33:39 +08:00
yangdx	207f0a7f2a	Merge branch 'main' into merge_lock_with_key	2025-07-09 09:25:28 +08:00
yangdx	cb3bfc0e5b	Release semphore before merge stage	2025-07-09 09:24:44 +08:00
Anton Vice	b192f8c9a3	Fix: Handle NoneType error when processing documents without a file path The document processing pipeline would crash with a TypeError when a document was submitted as raw text via the API, as the file_path attribute would be None. This change adds a check to handle the None case gracefully, preventing the crash and allowing text-based documents to be indexed correctly.	2025-07-08 19:35:22 -03:00
zrguo	71cb3adb4f	Merge branch 'main' into rerank	2025-07-08 15:10:23 +08:00
yangdx	56d43de58a	Merge branch 'main' into merge_lock_with_key	2025-07-08 12:46:31 +08:00
zrguo	f5c80d7cde	Simplify Configuration	2025-07-08 11:16:34 +08:00
yangdx	9b7b2a9b0f	Reduce default embedding batch size from 32 to 10	2025-07-08 11:00:09 +08:00
zrguo	75dd4f3498	add rerank model	2025-07-07 22:44:59 +08:00
yangdx	ef79088f60	Move max_graph_nodes to global config	2025-07-07 21:53:57 +08:00
yangdx	033098c1bc	Feat: Add WORKSPACE support to all storage types	2025-07-07 00:57:21 +08:00
yangdx	1b2d295a4f	Remove namespace_prefix	2025-07-06 00:16:47 +08:00
yangdx	98150e80b8	Improved empty/whitespace file handling - Better detection of whitespace-only files - Changed error to warning for empty chunks	2025-07-05 23:16:39 +08:00
xuewei	648a87653f	文本块是空白	2025-07-05 14:28:42 +08:00
yangdx	a9e10ae810	Update logger messages	2025-07-03 14:08:19 +08:00
yangdx	e56734cb8b	Refac: Optimize document deletion performance - Adding chunks_list to dock_status - Adding llm_cache_list to text_chunks - Implemented storage types: JsonKV and Redis	2025-07-03 04:18:25 +08:00
zrguo	479865a271	Add max_gleaning to env	2025-07-01 17:13:33 +08:00
yangdx	e70f5a35e5	Refac: Add KG rebuild logging with pipeline status - Logs detailed progress, including warnings and failures, to the pipeline status. - Adds counters to report the total number of successfully rebuilt entities and relationships upon completion.	2025-06-29 21:27:12 +08:00
yangdx	3a8a99b73d	feat(postgres): Implement text_chunks upsert for PGKVStorage	2025-06-28 14:37:35 +08:00
yangdx	8fb1c09b08	Refac: pipelinge message	2025-06-26 01:00:54 +08:00
yangdx	bdcd55a871	Feat: Add delete upload file option to document deletion	2025-06-25 19:02:46 +08:00
yangdx	51bb0471cd	Change the API for deleting documents to support deleting multiple documents at once.	2025-06-25 16:19:49 +08:00
yangdx	495d6c8cce	Improve the pipeline status message for document deletetion	2025-06-25 15:46:58 +08:00
yangdx	2aaa6d5f7d	Fix linting	2025-06-25 14:59:45 +08:00
yangdx	8a365533d7	Add comprehensive error handling for document deletion	2025-06-25 14:58:41 +08:00
yangdx	da46b341dc	feat: Optimize document deletion performance - To enhance performance during document deletion, new batch-get methods, `get_nodes_by_chunk_ids` and `get_edges_by_chunk_ids`, have been added to the graph storage layer (`BaseGraphStorage` and its implementations). The [`adelete_by_doc_id`](lightrag/lightrag.py:1681) function now leverages these methods to avoid unnecessary iteration over the entire knowledge graph, significantly improving efficiency. - Graph storage updated: Networkx, Neo4j, Postgres AGE	2025-06-25 12:37:57 +08:00
yangdx	2946bbdb71	Add TODO: There is performance when iterating get_all_labels	2025-06-24 11:32:28 +08:00
yangdx	e6baffe10c	Add retrun status to entity and relation delete operations	2025-06-23 21:39:45 +08:00
yangdx	bd487dd252	Unify document APIs returen status string	2025-06-23 21:38:47 +08:00
yangdx	ce50135efb	Improved docstring for document deletion method	2025-06-23 21:08:51 +08:00
yangdx	ebcabe29ca	Remove duplicated graph db lock	2025-06-23 18:46:01 +08:00
yangdx	5099ac8213	Fix linting	2025-06-23 18:41:30 +08:00
yangdx	a215939c41	Refac: Avoid duplicate edge processing in adelete_by_doc_id	2025-06-23 18:39:36 +08:00
yangdx	a0be65d5d9	Refac: Return status and messages for delete by doc id operaton	2025-06-23 17:59:27 +08:00
yangdx	9fae0eadff	feat: Ensure thread safety for graph write operations Add a lock to delete, adelete_by_entity, and adelete_by_relation methods to prevent race conditions and ensure data consistency during concurrent modifications to the knowledge graph.	2025-06-23 09:57:56 +08:00
zrguo	4937de8809	Update	2025-06-22 15:12:09 +08:00
zrguo	3abdc42549	Merge branch 'main' into delete_doc	2025-06-16 17:02:21 +08:00
kwilt	09cbcc4572	fix typo: "extrat" -> extract	2025-06-09 08:28:14 -05:00
zrguo	ead82a8dbd	update delete_by_doc_id	2025-06-09 18:52:34 +08:00

1 2 3 4 5 ...

501 Commits