mirror of https://github.com/OpenSPG/KAG.git synced 2025-12-27 15:14:07 +00:00

Go to file

feat(solver): support kag thinker (#640 )

* feat(kag): update to v0.7 (#456)

* add think cost

* update csv scanner

* add final rerank

* add reasoner

* add iterative planner

* fix dpr search

* fix dpr search

* add reference data

* move odps import

* update requirement.txt

* update 2wiki

* add missing file

* fix markdown reader

* add iterative planning

* update version

* update runner

* update 2wiki example

* update bridge

* merge solver and solver_new

* add cur day

* writer delete

* update multi process

* add missing files

* fix report

* add chunk retrieved executor

* update try in stream runner result

* add path

* add math executor

* update hotpotqa example

* remove log

* fix python coder solver

* update hotpotqa example

* fix python coder solver

* update config

* fix bad

* add log

* remove unused code

* commit with task thought

* move kag model to common

* add default chat llm

* fix

* use static planner

* support chunk graph node

* add args

* support naive rag

* llm client support tool calls

* add default async

* add openai

* fix result

* fix markdown reader

* fix thinker

* update asyncio interface

* feat(solver): add mcp support (#444)

* 上传mcp client相关代码

* 1、完成一套mcp client的调用，从pipeline到planner、executor
2、允许json中传入多个mcp_server，通过大模型进行调用并选择
3、调通baidu_map_mcp的使用

* 1、schema

* bugfix:删减冗余代码

---------

Co-authored-by: wanxingyu.wxy <wanxingyu.wxy@antgroup.com>

* fix affairqa after solver refactor

* fix affairqa after solver refactor

* fix readme

* add params

* update version

* update mcp executor

* update mcp executor

* solver add mcp executor

* add missing file

* add mpc executor

* add executor

* x

* update

* fix requirement

* fix main llm config

* fix solver

* bugfix:修复invoke函数调用逻辑

* chg eva

* update example

* add kag layer

* add step task

* support dot refresh

* support dot refresh

* support dot refresh

* support dot refresh

* add retrieved num

* add retrieved num

* add pipelineconf

* update ppr

* update musique prompts

* update

* add to_dict for BuilderComponentData

* async build

* add deduce prompt

* add deduce prompt

* add deduce prompt

* fix reader

* add deduce prompt

* add page thinker report

* modify prmpt

* add step status

* add self cognition

* add self cognition

* add memory graph storage

* add now time

* update memory config

* add now time

* chg graph loader

* 添加prqa数据集和代码

* bugfix:prqa调用逻辑修复

* optimize：优化代码逻辑，生成答案规范化

* add retry py code

* update memory graph

* update memory graph

* fix

* fix ner

* add with_out_refer generator prompt

* fix

* close ckpt

* fix query

* fix query

* update version

* add llm checker

* add llm checker

* 1、上传evalutor.py以及修改gold_answer.json格式
2、优化代码逻辑
3、修改README.md文件

* update exp

* update exp

* rerank support

* add static rewrite query

* recall more chunks

* fix graph load

* add static rewrite query

* fix bugs

* add finish check

* add finish check

* add finish check

* add finish check

* 1、上传evalutor.py的结果
2、优化代码逻辑，优化readme文件

* add lf retry

* add memory graph api

* fix reader api

* add ner

* add metrics

* fix bug

* remove ner

* add reraise fo retry

* add edge prop to memory graph

* add memory graph

* 1、评测数据集结果修正
2、优化evaluator.py代码
3、删除结果不存在而gold_answer中有答案的问题

* 删除评测结果文件

* fix knext host addr

* async eva

* add lf prompt

* add lf prompt

* add config

* add retry

* add unknown check

* add rc result

* add rc result

* add rc result

* add rc result

* 依据kag pipeline格式修改代码逻辑并通过测试

* bugfix:删除冗余代码

* fix report prompt

* bugfix:触发重试机制

* bugfix:中文符号错误

* fix rethinker prompt

* update version to 0.6.2b78

* update version

* 1、修改evaluator.py，通过大模型计算准确率，符合最新调用逻辑
2、修改prompt，让没有回答的结果重复测试

* update affairqa for evaluate

* update affairqa for evaluate

* bugfix:修正数据集

* bugfix:修正数据集

* bugfix:修正数据集

* fix name conflict

* bugfix:删除错误问题

* bugfix:文件名命名错误导致evaluator失败

* update for affairqa eval

* bugfix:修改代码保持evaluate逻辑一致

* x

* update for affairqa readme

* remove temp eval scripts

* bugfix for math deduce

* merge 0.6.2_dev

* merge 0.6.2_dev

* fix

* update client addr

* updated version

* update for affairqa eval

* evaUtils 支持中文

* fix affairqa eval:

* remove unused example

* update kag config

* fix default value

* update readme

* fix init

* 注释信息修改，并添加部分class说明

* update example config

* Tc 0.7.0 (#459)

* 提交affairQA 代码

* fix affairqa eval

---------

Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>

* fix all examples

* reformat

---------

Co-authored-by: peilong <peilong.zpl@antgroup.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: wanxingyu.wxy <wanxingyu.wxy@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>

* update chunk metadata

* update chunk metadata

* add debug reporter

* update table text

* add server

* fix math executor

* update api-key for openai vec

* update

* fix naive rag bug

* format code

* fix

---------

Co-authored-by: zhuzhongshu123 <152354526+zhuzhongshu123@users.noreply.github.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: wanxingyu.wxy <wanxingyu.wxy@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>

2025-07-08 17:44:32 +08:00

_static/images

feat(mcp): implement mcp server for kag (#594 )

2025-06-19 16:27:13 +08:00

.github

refactor(all): kag v0.6 (#174 )

2025-01-03 17:10:51 +08:00

docs

feat(kag) update docs for kag (#123 )

2024-12-12 22:19:13 +08:00

kag

feat(solver): support kag thinker (#640 )

2025-07-08 17:44:32 +08:00

knext

feat(solver): support kag thinker (#640 )

2025-07-08 17:44:32 +08:00

tests

feat(solver): support kag thinker (#640 )

2025-07-08 17:44:32 +08:00

.gitignore

feat(builder): add Azure Open AI Compatibility (#269 )

2025-01-14 12:57:43 +08:00

.pre-commit-config.yaml

refactor(all): kag v0.6 (#174 )

2025-01-03 17:10:51 +08:00

.scanignore

1、add scanignore

2025-05-29 19:19:36 +08:00

build.sh

refactor(all): kag v0.6 (#174 )

2025-01-03 17:10:51 +08:00

CITATION.cff

core_team #andy (#389 )

2025-03-03 19:13:44 +08:00

KAG_VERSION

feat(solver): support kag thinker (#640 )

2025-07-08 17:44:32 +08:00

LEGAL.md

Remove sensitive information

2024-10-24 11:46:15 +08:00

LICENSE

Initial commit

2024-09-21 21:56:45 +08:00

MANIFEST.in

feat(kag): update to v0.7 (#456 )

2025-04-17 17:23:52 +08:00

pytest.ini

refactor(all): kag v0.6 (#174 )

2025-01-03 17:10:51 +08:00

README_cn.md

feat(readme): add v0.8.0 Release Note (#618 )

2025-06-30 15:08:03 +08:00

README_ja.md

fix(kag): update examples to work under branch 0.8.0 (#593 )

2025-06-19 11:39:29 +08:00

README.md

feat(readme): add v0.8.0 Release Note (#618 )

2025-06-30 15:08:03 +08:00

requirements.txt

feat(solver): support kag thinker (#640 )

2025-07-08 17:44:32 +08:00

setup.cfg

refactor(all): kag v0.6 (#174 )

2025-01-03 17:10:51 +08:00

setup.py

refactor(all): kag v0.6 (#174 )

2025-01-03 17:10:51 +08:00

upload_dev.sh

Remove sensitive information

2024-10-24 11:46:15 +08:00

README.md

KAG: Knowledge Augmented Generation

English | 简体中文 | 日本語版ドキュメント

1. What is KAG?

KAG is a logical reasoning and Q&A framework based on the OpenSPG engine and large language models, which is used to build logical reasoning and Q&A solutions for vertical domain knowledge bases. KAG can effectively overcome the ambiguity of traditional RAG vector similarity calculation and the noise problem of GraphRAG introduced by OpenIE. KAG supports logical reasoning and multi-hop fact Q&A, etc., and is significantly better than the current SOTA method.

The goal of KAG is to build a knowledge-enhanced LLM service framework in professional domains, supporting logical reasoning, factual Q&A, etc. KAG fully integrates the logical and factual characteristics of the KGs. Its core features include:

Knowledge and Chunk Mutual Indexing structure to integrate more complete contextual text information
Knowledge alignment using conceptual semantic reasoning to alleviate the noise problem caused by OpenIE
Schema-constrained knowledge construction to support the representation and construction of domain expert knowledge
Logical form-guided hybrid reasoning and retrieval to support logical reasoning and multi-hop reasoning Q&A

⭐️ Star our repository to stay up-to-date with exciting new features and improvements! Get instant notifications for new releases! 🌟

2. Core Features

2.1 Knowledge Representation

In the context of private knowledge bases, unstructured data, structured information, and business expert experience often coexist. KAG references the DIKW hierarchy to upgrade SPG to a version that is friendly to LLMs.

For unstructured data such as news, events, logs, and books, as well as structured data like transactions, statistics, and approvals, along with business experience and domain knowledge rules, KAG employs techniques such as layout analysis, knowledge extraction, property normalization, and semantic alignment to integrate raw business data and expert rules into a unified business knowledge graph.

This makes it compatible with schema-free information extraction and schema-constrained expertise construction on the same knowledge type (e. G., entity type, event type), and supports the cross-index representation between the graph structure and the original text block.

This mutual index representation is helpful to the construction of inverted index based on graph structure, and promotes the unified representation and reasoning of logical forms.

2.2 Mixed Reasoning Guided by Logic Forms

KAG proposes a logically formal guided hybrid solution and inference engine.

The engine includes three types of operators: planning, reasoning, and retrieval, which transform natural language problems into problem solving processes that combine language and notation.

In this process, each step can use different operators, such as exact match retrieval, text retrieval, numerical calculation or semantic reasoning, so as to realize the integration of four different problem solving processes: Retrieval, Knowledge Graph reasoning, language reasoning and numerical calculation.

3. Release Notes

3.1 Latest Updates

2025.06.27 : Released KAG 0.8.0 Version
- Expanded two modes: Private Knowledge Base (including structured & unstructured data) and Public Network Knowledge Base, supporting integration of LBS, WebSearch, and other public data sources via MCP protocol.
- Enhanced Private Knowledge Base indexing capabilities, with built-in fundamental index types such as Outline, Summary, KnowledgeUnit, AtomicQuery, Chunk, and Table.
- Decoupled knowledge bases from applications: Knowledge Bases manage private data (structured & unstructured) and public data; Applications can associate with multiple knowledge bases and automatically adapt corresponding retrievers for data recall based on index types established during knowledge base construction.
- Fully embraced MCP, enabling KAG-powered inference QA (via MCP protocol) within agent workflows.
- Completed adaptation for the KAG-Thinker model. Through optimizations in breadth-wise problem decomposition, depth-wise solution derivation, knowledge boundary determination, and noise-resistant retrieval results, the framework's reasoning paradigm stability and logical rigor have been improved under the guidance of multi-round iterative thinking frameworks.
2025.04.17 : Released KAG 0.7 Version
- First, we refactored the KAG-Solver framework. Added support for two task planning modes, static and iterative, while implementing a more rigorous knowledge layering mechanism for the reasoning phase.
- Second, we optimized the product experience: introduced dual modes—"Simple Mode" and "Deep Reasoning"—during the reasoning phase, along with support for streaming inference output, automatic rendering of graph indexes, and linking generated content to original references.
- Added an open_benchmark directory to the top level of the KAG repository, comparing various RAG methods under the same base to achieve state-of-the-art (SOTA) results.
- Introduced a "Lightweight Build" mode, reducing knowledge construction token costs by 89%.
2025.01.07 : Support domain knowledge injection, domain schema customization, QFS tasks support, Visual query analysis, enables schema-constraint mode for extraction, etc.
2024.11.21 : Support Word docs upload, model invoke concurrency setting, User experience optimization, etc.
2024.10.25 : KAG initial release

3.2 Future Plans

We will continue to focus on enhancing large models' ability to leverage external knowledge bases. Our goal is to achieve bidirectional enhancement and seamless integration between large models and symbolic knowledge, improving the factuality, rigor, and consistency of reasoning and Q&A in professional scenarios. We will also keep releasing updates to push the boundaries of capability and drive adoption in vertical domains.

4. Quick Start

4.1 product-based (for ordinary users)

4.1.1 Engine & Dependent Image Installation

Recommend System Version:

macOS User：macOS Monterey 12.6 or later
Linux User：CentOS 7 / Ubuntu 20.04 or later
Windows User：Windows 10 LTSC 2021 or later

Software Requirements:

macOS / Linux User：Docker，Docker Compose
Windows User：WSL 2 / Hyper-V，Docker，Docker Compose

Use the following commands to download the docker-compose.yml file and launch the services with Docker Compose.

# set the HOME environment variable (only Windows users need to execute this command)
# set HOME=%USERPROFILE%

curl -sSL https://raw.githubusercontent.com/OpenSPG/openspg/refs/heads/master/dev/release/docker-compose-west.yml -o docker-compose-west.yml
docker compose -f docker-compose-west.yml up -d

4.1.2 Use the product

Navigate to the default url of the KAG product with your browser: http://127.0.0.1:8887

Default Username: openspg
Default password: openspg@kag

See KAG usage (product mode) for detailed introduction.

4.2 toolkit-based (for developers)

4.2.1 Engine & Dependent Image Installation

Refer to the 3.1 section to complete the installation of the engine & dependent image.

4.2.2 Installation of KAG

macOS / Linux developers

# Create conda env: conda create -n kag-demo python=3.10 && conda activate kag-demo

# Clone code: git clone https://github.com/OpenSPG/KAG.git

# Install KAG: cd KAG && pip install -e .

Windows developers

# Install the official Python 3.10 or later, install Git.

# Create and activate Python venv: py -m venv kag-demo && kag-demo\Scripts\activate

# Clone code: git clone https://github.com/OpenSPG/KAG.git

# Install KAG: cd KAG && pip install -e .

4.2.3 Use the toolkit

Please refer to KAG usage (developer mode) guide for detailed introduction of the toolkit. Then you can use the built-in components to reproduce the performance results of the built-in datasets, and apply those components to new busineness scenarios.

5. Technical Architecture

The KAG framework includes three parts: kg-builder, kg-solver, and kag-model. This release only involves the first two parts, kag-model will be gradually open source release in the future.

kg-builder implements a knowledge representation that is friendly to large-scale language models (LLM). Based on the hierarchical structure of DIKW (data, information, knowledge and wisdom), IT upgrades SPG knowledge representation ability, and is compatible with information extraction without schema constraints and professional knowledge construction with schema constraints on the same knowledge type (such as entity type and event type), it also supports the mutual index representation between the graph structure and the original text block, which supports the efficient retrieval of the reasoning question and answer stage.

kg-solver uses a logical symbol-guided hybrid solving and reasoning engine that includes three types of operators: planning, reasoning, and retrieval, to transform natural language problems into a problem-solving process that combines language and symbols. In this process, each step can use different operators, such as exact match retrieval, text retrieval, numerical calculation or semantic reasoning, so as to realize the integration of four different problem solving processes: Retrieval, Knowledge Graph reasoning, language reasoning and numerical calculation.

6. Community & Support

GitHub: https://github.com/OpenSPG/KAG

Website: https://openspg.github.io/v2/docs_en

Discord

Join our Discord community.

WeChat

Follow OpenSPG Official Account to get technical articles and product updates about OpenSPG and KAG.

Scan the QR code below to join our WeChat group.

7. Differences between KAG, RAG, and GraphRAG

KAG introduction and applications: https://github.com/orgs/OpenSPG/discussions/52

8. Citation

If you use this software, please cite it as below:

KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation
KGFabric: A Scalable Knowledge Graph Warehouse for Enterprise Data Interconnection

@article{liang2024kag,
  title={KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation},
  author={Liang, Lei and Sun, Mengshu and Gui, Zhengke and Zhu, Zhongshu and Jiang, Zhouyu and Zhong, Ling and Zhao, Peilong and Bo, Zhongpu and Yang, Jin and others},
  journal={arXiv preprint arXiv:2409.13731},
  year={2024}
}

@article{yikgfabric,
  title={KGFabric: A Scalable Knowledge Graph Warehouse for Enterprise Data Interconnection},
  author={Yi, Peng and Liang, Lei and Da Zhang, Yong Chen and Zhu, Jinye and Liu, Xiangyu and Tang, Kun and Chen, Jialin and Lin, Hao and Qiu, Leijie and Zhou, Jun}
}

License

Apache License 2.0

KAG Core Team

Lei Liang, Mengshu Sun, Zhengke Gui, Zhongshu Zhu, Zhouyu Jiang, Ling Zhong, Peilong Zhao, Zhongpu Bo, Jin Yang, Huaidong Xiong, Lin Yuan, Jun Xu, Zaoyang Wang, Zhiqiang Zhang, Wen Zhang, Huajun Chen, Wenguang Chen, Jun Zhou, Haofen Wang

Description

KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge bases. It can effectively overcome the shortcomings of the traditional RAG vector similarity calculation model.

knowledge-graph large-language-model logical-reasoning multi-hop-question-answering trustfulness

Readme Apache-2.0 Cite this repository 267 MiB

README.md Unescape Escape

KAG: Knowledge Augmented Generation

1. What is KAG?

2. Core Features

2.1 Knowledge Representation

2.2 Mixed Reasoning Guided by Logic Forms

3. Release Notes

3.1 Latest Updates

3.2 Future Plans

4. Quick Start

4.1 product-based (for ordinary users)

4.1.1 Engine & Dependent Image Installation

4.1.2 Use the product

4.2 toolkit-based (for developers)

4.2.1 Engine & Dependent Image Installation

4.2.2 Installation of KAG

4.2.3 Use the toolkit

5. Technical Architecture

6. Community & Support

Discord

WeChat

7. Differences between KAG, RAG, and GraphRAG

8. Citation

License

KAG Core Team

README.md