Refactor keyword extraction rules and remove overlap constraint

• Require content in both keyword categories
• Remove no-overlap rule between lists
• Simplify edge case handling
• Clarify source of truth requirement
This commit is contained in:
yangdx 2025-08-19 15:07:40 +08:00
parent 9ed5b93467
commit ac33cf693d

View File

@ -246,10 +246,9 @@ Given a user query, your task is to extract two distinct types of keywords:
---Instructions & Constraints---
1. **Output Format**: Your output MUST be a valid JSON object and nothing else. Do not include any explanatory text, markdown code fences (like ```json), or any other text before or after the JSON. It will be parsed directly by a JSON parser.
2. **Source of Truth**: All keywords must be derived directly from or be a direct interpretation of the user query.
2. **Source of Truth**: All keywords must be explicitly derived from the user query, with both high-level and low-level keyword categories required to contain content.
3. **Concise & Meaningful**: Keywords should be concise words or meaningful phrases. Prioritize multi-word phrases when they represent a single concept. For example, from "latest financial report of Apple Inc.", you should extract "latest financial report" and "Apple Inc." rather than "latest", "financial", "report", and "Apple".
4. **No Overlap**: A keyword or its core concept should not appear in both the high-level and low-level lists.
5. **Handle Edge Cases**: For queries that are too simple, vague, or nonsensical (e.g., "hello", "ok", "asdfghjkl"), you must return a JSON object with empty lists for both keyword types.
4. **Handle Edge Cases**: For queries that are too simple, vague, or nonsensical (e.g., "hello", "ok", "asdfghjkl"), you must return a JSON object with empty lists for both keyword types.
---Examples---
{examples}