Refactor keyword extraction rules and remove overlap constraint

• Require content in both keyword categories • Remove no-overlap rule between lists • Simplify edge case handling • Clarify source of truth requirement
2025-12-03 02:16:42 +00:00 · 2025-08-19 15:07:40 +08:00 · 2025-08-19 15:07:40 +08:00 · ac33cf693d
commit ac33cf693d
parent 9ed5b93467
1 changed files with 2 additions and 3 deletions
--- a/lightrag/prompt.py
+++ b/lightrag/prompt.py
@ -246,10 +246,9 @@ Given a user query, your task is to extract two distinct types of keywords:

 ---Instructions & Constraints---
 1. **Output Format**: Your output MUST be a valid JSON object and nothing else. Do not include any explanatory text, markdown code fences (like ```json), or any other text before or after the JSON. It will be parsed directly by a JSON parser.
-2. **Source of Truth**: All keywords must be derived directly from or be a direct interpretation of the user query.
+2. **Source of Truth**: All keywords must be explicitly derived from the user query, with both high-level and low-level keyword categories required to contain content.
 3. **Concise & Meaningful**: Keywords should be concise words or meaningful phrases. Prioritize multi-word phrases when they represent a single concept. For example, from "latest financial report of Apple Inc.", you should extract "latest financial report" and "Apple Inc." rather than "latest", "financial", "report", and "Apple".
-4. **No Overlap**: A keyword or its core concept should not appear in both the high-level and low-level lists.
-5. **Handle Edge Cases**: For queries that are too simple, vague, or nonsensical (e.g., "hello", "ok", "asdfghjkl"), you must return a JSON object with empty lists for both keyword types.
+4. **Handle Edge Cases**: For queries that are too simple, vague, or nonsensical (e.g., "hello", "ok", "asdfghjkl"), you must return a JSON object with empty lists for both keyword types.

 ---Examples---
 {examples}