mirror of
https://github.com/OpenSPG/KAG.git
synced 2025-11-22 21:30:16 +00:00
* add think cost * update csv scanner * add final rerank * add reasoner * add iterative planner * fix dpr search * fix dpr search * add reference data * move odps import * update requirement.txt * update 2wiki * add missing file * fix markdown reader * add iterative planning * update version * update runner * update 2wiki example * update bridge * merge solver and solver_new * add cur day * writer delete * update multi process * add missing files * fix report * add chunk retrieved executor * update try in stream runner result * add path * add math executor * update hotpotqa example * remove log * fix python coder solver * update hotpotqa example * fix python coder solver * update config * fix bad * add log * remove unused code * commit with task thought * move kag model to common * add default chat llm * fix * use static planner * support chunk graph node * add args * support naive rag * llm client support tool calls * add default async * add openai * fix result * fix markdown reader * fix thinker * update asyncio interface * feat(solver): add mcp support (#444) * 上传mcp client相关代码 * 1、完成一套mcp client的调用,从pipeline到planner、executor 2、允许json中传入多个mcp_server,通过大模型进行调用并选择 3、调通baidu_map_mcp的使用 * 1、schema * bugfix:删减冗余代码 --------- Co-authored-by: wanxingyu.wxy <wanxingyu.wxy@antgroup.com> * fix affairqa after solver refactor * fix affairqa after solver refactor * fix readme * add params * update version * update mcp executor * update mcp executor * solver add mcp executor * add missing file * add mpc executor * add executor * x * update * fix requirement * fix main llm config * fix solver * bugfix:修复invoke函数调用逻辑 * chg eva * update example * add kag layer * add step task * support dot refresh * support dot refresh * support dot refresh * support dot refresh * add retrieved num * add retrieved num * add pipelineconf * update ppr * update musique prompts * update * add to_dict for BuilderComponentData * async build * add deduce prompt * add deduce prompt * add deduce prompt * fix reader * add deduce prompt * add page thinker report * modify prmpt * add step status * add self cognition * add self cognition * add memory graph storage * add now time * update memory config * add now time * chg graph loader * 添加prqa数据集和代码 * bugfix:prqa调用逻辑修复 * optimize:优化代码逻辑,生成答案规范化 * add retry py code * update memory graph * update memory graph * fix * fix ner * add with_out_refer generator prompt * fix * close ckpt * fix query * fix query * update version * add llm checker * add llm checker * 1、上传evalutor.py以及修改gold_answer.json格式 2、优化代码逻辑 3、修改README.md文件 * update exp * update exp * rerank support * add static rewrite query * recall more chunks * fix graph load * add static rewrite query * fix bugs * add finish check * add finish check * add finish check * add finish check * 1、上传evalutor.py的结果 2、优化代码逻辑,优化readme文件 * add lf retry * add memory graph api * fix reader api * add ner * add metrics * fix bug * remove ner * add reraise fo retry * add edge prop to memory graph * add memory graph * 1、评测数据集结果修正 2、优化evaluator.py代码 3、删除结果不存在而gold_answer中有答案的问题 * 删除评测结果文件 * fix knext host addr * async eva * add lf prompt * add lf prompt * add config * add retry * add unknown check * add rc result * add rc result * add rc result * add rc result * 依据kag pipeline格式修改代码逻辑并通过测试 * bugfix:删除冗余代码 * fix report prompt * bugfix:触发重试机制 * bugfix:中文符号错误 * fix rethinker prompt * update version to 0.6.2b78 * update version * 1、修改evaluator.py,通过大模型计算准确率,符合最新调用逻辑 2、修改prompt,让没有回答的结果重复测试 * update affairqa for evaluate * update affairqa for evaluate * bugfix:修正数据集 * bugfix:修正数据集 * bugfix:修正数据集 * fix name conflict * bugfix:删除错误问题 * bugfix:文件名命名错误导致evaluator失败 * update for affairqa eval * bugfix:修改代码保持evaluate逻辑一致 * x * update for affairqa readme * remove temp eval scripts * bugfix for math deduce * merge 0.6.2_dev * merge 0.6.2_dev * fix * update client addr * updated version * update for affairqa eval * evaUtils 支持中文 * fix affairqa eval: * remove unused example * update kag config * fix default value * update readme * fix init * 注释信息修改,并添加部分class说明 * update example config * Tc 0.7.0 (#459) * 提交affairQA 代码 * fix affairqa eval --------- Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com> * fix all examples * reformat --------- Co-authored-by: peilong <peilong.zpl@antgroup.com> Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com> Co-authored-by: wanxingyu.wxy <wanxingyu.wxy@antgroup.com> Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>
281 lines
8.8 KiB
Python
281 lines
8.8 KiB
Python
# -*- coding: utf-8 -*-
|
|
# Copyright 2023 OpenSPG Authors
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
|
|
# in compliance with the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software distributed under the License
|
|
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
|
|
# or implied.
|
|
from collections import OrderedDict
|
|
import logging
|
|
import re
|
|
import json
|
|
import os
|
|
import sys
|
|
from configparser import ConfigParser
|
|
from pathlib import Path
|
|
from ruamel.yaml import YAML
|
|
from typing import Optional
|
|
|
|
import click
|
|
|
|
from knext.common.utils import copytree, copyfile
|
|
from knext.project.client import ProjectClient
|
|
|
|
from knext.common.env import env, DEFAULT_HOST_ADDR
|
|
|
|
from kag.common.llm.llm_config_checker import LLMConfigChecker
|
|
from kag.common.vectorize_model.vectorize_model_config_checker import (
|
|
VectorizeModelConfigChecker,
|
|
)
|
|
from shutil import copy2
|
|
|
|
try:
|
|
import kag_ant
|
|
except ImportError:
|
|
pass
|
|
|
|
yaml = YAML()
|
|
yaml.default_flow_style = False
|
|
yaml.indent(mapping=2, sequence=4, offset=2)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
def _render_template(namespace: str, tmpl: str, **kwargs):
|
|
config_path = kwargs.get("config_path", None)
|
|
project_dir = Path(namespace)
|
|
if not project_dir.exists():
|
|
project_dir.mkdir()
|
|
|
|
import kag.templates.project
|
|
|
|
src = Path(kag.templates.project.__path__[0])
|
|
copytree(
|
|
src,
|
|
project_dir.resolve(),
|
|
namespace=namespace,
|
|
root=namespace,
|
|
tmpl=tmpl,
|
|
**kwargs,
|
|
)
|
|
|
|
import kag.templates.schema
|
|
|
|
src = Path(kag.templates.schema.__path__[0]) / f"{{{{{tmpl}}}}}.schema.tmpl"
|
|
if not src.exists():
|
|
click.secho(
|
|
f"ERROR: No such schema template: {tmpl}.schema.tmpl",
|
|
fg="bright_red",
|
|
)
|
|
dst = project_dir.resolve() / "schema" / f"{{{{{tmpl}}}}}.schema.tmpl"
|
|
copyfile(src, dst, namespace=namespace, **{tmpl: namespace})
|
|
|
|
tmpls = [tmpl, "default"] if tmpl != "default" else [tmpl]
|
|
# find all .yaml files in project dir
|
|
config = yaml.load(Path(config_path).read_text() or "{}")
|
|
project_id = kwargs.get("id", None)
|
|
config["project"]["id"] = project_id
|
|
config_file_path = project_dir.resolve() / "kag_config.yaml"
|
|
with open(config_file_path, "w") as config_file:
|
|
yaml.dump(config, config_file)
|
|
return project_dir
|
|
|
|
|
|
def _recover_project(prj_path: str):
|
|
"""
|
|
Recover project by a project dir path.
|
|
"""
|
|
if not Path(prj_path).exists():
|
|
click.secho(f"ERROR: No such directory: {prj_path}", fg="bright_red")
|
|
sys.exit()
|
|
|
|
project_name = env.project_config.get("namespace", None)
|
|
namespace = env.project_config.get("namespace", None)
|
|
desc = env.project_config.get("description", None)
|
|
if not namespace:
|
|
click.secho(
|
|
f"ERROR: No project namespace found in {env.config_path}.",
|
|
fg="bright_red",
|
|
)
|
|
sys.exit()
|
|
|
|
client = ProjectClient()
|
|
project = client.get(namespace=namespace) or client.create(
|
|
name=project_name, desc=desc, namespace=namespace, config=json.dumps(env._config)
|
|
)
|
|
|
|
env._config["project"]["id"] = project.id
|
|
env.dump()
|
|
|
|
click.secho(
|
|
f"Project [{project_name}] with namespace [{namespace}] was successfully recovered from [{prj_path}].",
|
|
fg="bright_green",
|
|
)
|
|
|
|
|
|
@click.option("--config_path", help="Path of config.", required=True)
|
|
@click.option(
|
|
"--tmpl",
|
|
help="Template of project, use default if not specified.",
|
|
default="default",
|
|
type=click.Choice(["default", "medical"], case_sensitive=False),
|
|
)
|
|
@click.option(
|
|
"--delete_cfg",
|
|
help="whether delete your defined .yaml file.",
|
|
default=False,
|
|
hidden=True,
|
|
)
|
|
def create_project(
|
|
config_path: str, tmpl: Optional[str] = None, delete_cfg: bool = False
|
|
):
|
|
"""
|
|
Create new project with a demo case.
|
|
"""
|
|
|
|
config = yaml.load(Path(config_path).read_text() or "{}")
|
|
project_config = config.get("project", {})
|
|
namespace = project_config.get("namespace", None)
|
|
name = project_config.get("namespace", None)
|
|
host_addr = project_config.get("host_addr", None)
|
|
|
|
if not namespace:
|
|
click.secho("ERROR: namespace is required.")
|
|
sys.exit()
|
|
|
|
if not re.match(r"^[A-Z][A-Za-z0-9]{0,15}$", namespace):
|
|
raise click.BadParameter(
|
|
f"Invalid namespace: {namespace}."
|
|
f" Must start with an uppercase letter, only contain letters and numbers, and have a maximum length of 16."
|
|
)
|
|
|
|
if not tmpl:
|
|
tmpl = "default"
|
|
|
|
project_id = None
|
|
|
|
llm_config_checker = LLMConfigChecker()
|
|
vectorize_model_config_checker = VectorizeModelConfigChecker()
|
|
llm_config = config.get("chat_llm", {})
|
|
vectorize_model_config = config.get("vectorizer", {})
|
|
try:
|
|
llm_config_checker.check(json.dumps(llm_config))
|
|
dim = vectorize_model_config_checker.check(json.dumps(vectorize_model_config))
|
|
config["vectorizer"]["vector_dimensions"] = dim
|
|
except Exception as e:
|
|
click.secho(f"Error: {e}", fg="bright_red")
|
|
sys.exit()
|
|
|
|
if host_addr:
|
|
client = ProjectClient(host_addr=host_addr)
|
|
project = client.create(name=name, namespace=namespace, config=json.dumps(config))
|
|
|
|
if project and project.id:
|
|
project_id = project.id
|
|
else:
|
|
click.secho("ERROR: host_addr is required.", fg="bright_red")
|
|
sys.exit()
|
|
|
|
project_dir = _render_template(
|
|
namespace=namespace,
|
|
tmpl=tmpl,
|
|
id=project_id,
|
|
with_server=(host_addr is not None),
|
|
host_addr=host_addr,
|
|
name=name,
|
|
config_path=config_path,
|
|
delete_cfg=delete_cfg,
|
|
)
|
|
|
|
current_dir = os.getcwd()
|
|
os.chdir(project_dir)
|
|
update_project(project_dir)
|
|
os.chdir(current_dir)
|
|
|
|
if delete_cfg and os.path.exists(config_path):
|
|
os.remove(config_path)
|
|
|
|
click.secho(
|
|
f"Project with namespace [{namespace}] was successfully created in {project_dir.resolve()} \n"
|
|
+ "You can checkout your project with: \n"
|
|
+ f" cd {project_dir}",
|
|
fg="bright_green",
|
|
)
|
|
|
|
|
|
@click.option("--host_addr", help="Address of spg server.", default=None)
|
|
@click.option("--proj_path", help="Path of project.", default=None)
|
|
def restore_project(host_addr, proj_path):
|
|
if host_addr is None:
|
|
host_addr = env.host_addr
|
|
if proj_path is None:
|
|
proj_path = env.project_path
|
|
proj_client = ProjectClient(host_addr=host_addr)
|
|
|
|
project_wanted = proj_client.get_by_namespace(namespace=env.namespace)
|
|
if not project_wanted:
|
|
if host_addr:
|
|
client = ProjectClient(host_addr=host_addr)
|
|
project = client.create(name=env.name, namespace=env.namespace, config=json.dumps(env._config))
|
|
project_id = project.id
|
|
else:
|
|
project_id = project_wanted.id
|
|
# write project id and host addr to kag_config.yaml
|
|
env._config["project"]["id"] = project_id
|
|
env._config["project"]["host_addr"] = host_addr
|
|
env.dump()
|
|
if proj_path:
|
|
_recover_project(proj_path)
|
|
update_project(proj_path)
|
|
|
|
|
|
@click.option("--proj_path", help="Path of config.", default=None)
|
|
def update_project(proj_path):
|
|
if not proj_path:
|
|
proj_path = env.project_path
|
|
client = ProjectClient(host_addr=env.host_addr)
|
|
|
|
llm_config_checker = LLMConfigChecker()
|
|
vectorize_model_config_checker = VectorizeModelConfigChecker()
|
|
llm_config = env.config.get("chat_llm", {})
|
|
vectorize_model_config = env.config.get("vectorizer", {})
|
|
try:
|
|
llm_config_checker.check(json.dumps(llm_config))
|
|
dim = vectorize_model_config_checker.check(json.dumps(vectorize_model_config))
|
|
env._config["vectorizer"]["vector_dimensions"] = dim
|
|
except Exception as e:
|
|
click.secho(f"Error: {e}", fg="bright_red")
|
|
sys.exit()
|
|
|
|
logger.info(f"project id: {env.id}")
|
|
client.update(id=env.id, config=json.dumps(env._config))
|
|
click.secho(
|
|
f"Project [{env.name}] with namespace [{env.namespace}] was successfully updated from [{proj_path}].",
|
|
fg="bright_green",
|
|
)
|
|
|
|
@click.option("--host_addr", help="Address of spg server.", default=DEFAULT_HOST_ADDR)
|
|
def list_project(host_addr):
|
|
client = ProjectClient(
|
|
host_addr=host_addr
|
|
)
|
|
projects = client.get_all()
|
|
|
|
headers = ["Project Name", "Project ID"]
|
|
|
|
click.echo(click.style(f"{' | '.join(headers)}", fg="bright_green", bold=True))
|
|
click.echo(
|
|
click.style(
|
|
f"{'-' * (len(headers[0]) + len(headers[1]) + 3)}", fg="bright_green"
|
|
)
|
|
)
|
|
|
|
for project_name, project_id in projects.items():
|
|
click.echo(
|
|
click.style(f"{project_name:<20} | {project_id:<10}", fg="bright_green")
|
|
)
|