mirror of
https://github.com/OpenSPG/KAG.git
synced 2025-11-21 04:48:29 +00:00
* add path find * fix find path * spg guided relation extraction * fix dict parse with same key * rename graphalgoclient to graphclient * rename graphalgoclient to graphclient * file reader supports http url * add checkpointer class * parser supports checkpoint * add build * remove incorrect logs * remove logs * update examples * update chain checkpointer * vectorizer batch size set to 32 * add a zodb backended checkpointer * add a zodb backended checkpointer * fix zodb based checkpointer * add thread for zodb IO * fix(common): resolve mutlithread conflict in zodb IO * fix(common): load existing zodb checkpoints * update examples * update examples * fix zodb writer * add docstring * fix jieba version mismatch * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * 1、fix bug in base_table_splitter * 1、fix bug in base_table_splitter * 1、fix bug in default_chain * 增加solver * add kag * update outline splitter * add main test * add op * code refactor * add tools * fix outline splitter * fix outline prompt * graph api pass * commit with page rank * add search api and graph api * add markdown report * fix vectorizer num batch compute * add retry for vectorize model call * update markdown reader * update markdown reader * update pdf reader * raise extractor failure * add default expr * add log * merge jc reader features * rm import * add build * fix zodb based checkpointer * add thread for zodb IO * fix(common): resolve mutlithread conflict in zodb IO * fix(common): load existing zodb checkpoints * update examples * update examples * fix zodb writer * add docstring * fix jieba version mismatch * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * 1、fix bug in base_table_splitter * 1、fix bug in base_table_splitter * 1、fix bug in default_chain * update outline splitter * add main test * add markdown report * code refactor * fix outline splitter * fix outline prompt * update markdown reader * fix vectorizer num batch compute * add retry for vectorize model call * update markdown reader * raise extractor failure * rm parser * run pipeline * add config option of whether to perform llm config check, default to false * fix * recover pdf reader * several components can be null for default chain * 支持完整qa运行 * add if * remove unused code * 使用chunk兜底 * excluded source relation to choose * add generate * default recall 10 * add local memory * 排除相似边 * 增加保护 * 修复并发问题 * add debug logger * 支持topk参数化 * 支持chunk截断和调整spo select 的prompt * 增加查询请求保护 * 增加force_chunk配置 * fix entity linker algorithm * 增加sub query改写 * fix md reader dup in test * fix * merge knext to kag parallel * fix package * 修复指标下跌问题 * scanner update * scanner update * add doc and update example scripts * fix * add bridge to spg server * add format * fix bridge * update conf for baike * disable ckpt for spg server runner * llm invoke error default raise exceptions * chore(version): bump version to X.Y.Z * update default response generation prompt * add method getSummarizationMetrics * fix(common): fix project conf empty error * fix typo * 增加上报信息 * 修改main solver * postprocessor support spg server * 修改solver支持名 * fix language * 修改chunker接口,增加openapi * rename vectorizer to vectorize_model in spg server config * generate_random_string start with gen * add knext llm vector checker * add knext llm vector checker * add knext llm vector checker * solver移除默认值 * udpate yaml and register_name for baike * udpate yaml and register_name for baike * remove config key check * 修复llmmodule * fix knext project * udpate yaml and register_name for examples * udpate yaml and register_name for examples * Revert "udpate yaml and register_name for examples" This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9. * update register name * fix * fix * support multiple resigter names * update component * update reader register names (#183) * fix markdown reader * fix llm client for retry * feat(common): add processed chunk id checkpoint (#185) * update reader register names * add processed chunk id checkpoint * feat(example): add example config (#186) * update reader register names * add processed chunk id checkpoint * add example config file * add max_workers parameter for getSummarizationMetrics to make it faster * add csqa data generation script generate_data.py * commit generated csqa builder and solver data * add csqa basic project files * adjust split_length and num_threads_per_chain to match lightrag settings * ignore ckpt dirs * add csqa evaluation script eval.py * save evaluation scripts summarization_metrics.py and factual_correctness.py * save LightRAG output csqa_lightrag_answers.json * ignore KAG output csqa_kag_answers.json * add README.md for CSQA * fix(solver): fix solver pipeline conf (#191) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * update links and file paths * reformat csqa kag_config.yaml * reformat csqa python files * reformat getSummarizationMetrics and compare_summarization_answers * fix(solver): fix solver config (#192) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * add except * fix typo in csqa README.md * feat(conf): support reinitialize config for call from java side (#199) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * support reinitialize config for java call * revert default response generation prompt * update project list * add README.md for the hotpotqa, 2wiki and musique examples * 增加spo检索 * turn off kag config dump by default * turn off knext schema dump by default * add .gitignore and fix kag_config.yaml * add README.md for the medicine example * add README.md for the supplychain example * bugfix for risk mining * use exact out * refactor(solver): format solver code (#205) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * support reinitialize config for java call * black format --------- Co-authored-by: peilong <peilong.zpl@antgroup.com> Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com> Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com> Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
396 lines
13 KiB
Python
396 lines
13 KiB
Python
# coding: utf-8
|
|
# Copyright 2023 OpenSPG Authors
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
|
|
# in compliance with the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software distributed under the License
|
|
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
|
|
# or implied.
|
|
|
|
|
|
"""
|
|
knext
|
|
|
|
No description provided (generated by Openapi Generator https://github.com/openapitools/openapi-generator) # noqa: E501
|
|
|
|
The version of the OpenAPI document: 1.0.0
|
|
Generated by: https://openapi-generator.tech
|
|
"""
|
|
|
|
|
|
from __future__ import absolute_import
|
|
|
|
import copy
|
|
import logging
|
|
import multiprocessing
|
|
import os
|
|
from pathlib import Path
|
|
import sys
|
|
import yaml
|
|
|
|
import six
|
|
import urllib3
|
|
from six.moves import http_client as httplib
|
|
from knext.common.env import env
|
|
|
|
|
|
class Configuration(object):
|
|
"""NOTE: This class is auto generated by OpenAPI Generator
|
|
|
|
Ref: https://openapi-generator.tech
|
|
Do not edit the class manually.
|
|
|
|
:param host: Base url
|
|
:param api_key: Dict to store API key(s).
|
|
Each entry in the dict specifies an API key.
|
|
The dict key is the name of the security scheme in the OAS specification.
|
|
The dict value is the API key secret.
|
|
:param api_key_prefix: Dict to store API prefix (e.g. Bearer)
|
|
The dict key is the name of the security scheme in the OAS specification.
|
|
The dict value is an API key prefix when generating the auth data.
|
|
:param username: Username for HTTP basic authentication
|
|
:param password: Password for HTTP basic authentication
|
|
:param discard_unknown_keys: Boolean value indicating whether to discard
|
|
unknown properties. A server may send a response that includes additional
|
|
properties that are not known by the client in the following scenarios:
|
|
1. The OpenAPI document is incomplete, i.e. it does not match the server
|
|
implementation.
|
|
2. The client was generated using an older version of the OpenAPI document
|
|
and the server has been upgraded since then.
|
|
If a schema in the OpenAPI document defines the additionalProperties attribute,
|
|
then all undeclared properties received by the server are injected into the
|
|
additional properties map. In that case, there are undeclared properties, and
|
|
nothing to discard.
|
|
|
|
"""
|
|
|
|
_default = None
|
|
|
|
def __init__(
|
|
self,
|
|
host=None,
|
|
api_key=None,
|
|
api_key_prefix=None,
|
|
username=None,
|
|
password=None,
|
|
discard_unknown_keys=False,
|
|
):
|
|
"""Constructor"""
|
|
self.host = host or os.getenv("KAG_PROJECT_HOST_ADDR") or env.host_addr
|
|
"""Default Base url
|
|
"""
|
|
self.temp_folder_path = None
|
|
"""Temp file folder for downloading files
|
|
"""
|
|
# Authentication Settings
|
|
self.api_key = {}
|
|
if api_key:
|
|
self.api_key = api_key
|
|
"""dict to store API key(s)
|
|
"""
|
|
self.api_key_prefix = {}
|
|
if api_key_prefix:
|
|
self.api_key_prefix = api_key_prefix
|
|
"""dict to store API prefix (e.g. Bearer)
|
|
"""
|
|
self.refresh_api_key_hook = None
|
|
"""function hook to refresh API key if expired
|
|
"""
|
|
self.username = username
|
|
"""Username for HTTP basic authentication
|
|
"""
|
|
self.password = password
|
|
"""Password for HTTP basic authentication
|
|
"""
|
|
self.discard_unknown_keys = discard_unknown_keys
|
|
self.logger = {}
|
|
"""Logging Settings
|
|
"""
|
|
self.logger["package_logger"] = logging.getLogger("rest")
|
|
self.logger["urllib3_logger"] = logging.getLogger("urllib3")
|
|
self.logger_format = "%(asctime)s %(levelname)s %(message)s"
|
|
"""Log format
|
|
"""
|
|
self.logger_stream_handler = None
|
|
"""Log stream handler
|
|
"""
|
|
self.logger_file_handler = None
|
|
"""Log file handler
|
|
"""
|
|
self.logger_file = None
|
|
"""Debug file location
|
|
"""
|
|
self.debug = False
|
|
"""Debug switch
|
|
"""
|
|
|
|
self.verify_ssl = True
|
|
"""SSL/TLS verification
|
|
Set this to false to skip verifying SSL certificate when calling API
|
|
from https server.
|
|
"""
|
|
self.ssl_ca_cert = None
|
|
"""Set this to customize the certificate file to verify the peer.
|
|
"""
|
|
self.cert_file = None
|
|
"""client certificate file
|
|
"""
|
|
self.key_file = None
|
|
"""client key file
|
|
"""
|
|
self.assert_hostname = None
|
|
"""Set this to True/False to enable/disable SSL hostname verification.
|
|
"""
|
|
|
|
self.connection_pool_maxsize = multiprocessing.cpu_count() * 5
|
|
"""urllib3 connection pool's maximum number of connections saved
|
|
per pool. urllib3 uses 1 connection as default value, but this is
|
|
not the best value when you are making a lot of possibly parallel
|
|
requests to the same host, which is often the case here.
|
|
cpu_count * 5 is used as default value to increase performance.
|
|
"""
|
|
|
|
self.proxy = None
|
|
"""Proxy URL
|
|
"""
|
|
self.proxy_headers = None
|
|
"""Proxy headers
|
|
"""
|
|
self.safe_chars_for_path_param = ""
|
|
"""Safe chars for path_param
|
|
"""
|
|
self.retries = None
|
|
"""Adding retries to override urllib3 default value 3
|
|
"""
|
|
# Disable client side validation
|
|
self.client_side_validation = True
|
|
|
|
def __deepcopy__(self, memo):
|
|
cls = self.__class__
|
|
result = cls.__new__(cls)
|
|
memo[id(self)] = result
|
|
for k, v in self.__dict__.items():
|
|
if k not in ("logger", "logger_file_handler"):
|
|
setattr(result, k, copy.deepcopy(v, memo))
|
|
# shallow copy of loggers
|
|
result.logger = copy.copy(self.logger)
|
|
# use setters to configure loggers
|
|
result.logger_file = self.logger_file
|
|
result.debug = self.debug
|
|
return result
|
|
|
|
def __setattr__(self, name, value):
|
|
object.__setattr__(self, name, value)
|
|
|
|
@classmethod
|
|
def set_default(cls, default):
|
|
"""Set default instance of configuration.
|
|
|
|
It stores default configuration, which can be
|
|
returned by get_default_copy method.
|
|
|
|
:param default: object of Configuration
|
|
"""
|
|
cls._default = copy.deepcopy(default)
|
|
|
|
@classmethod
|
|
def get_default_copy(cls):
|
|
"""Return new instance of configuration.
|
|
|
|
This method returns newly created, based on default constructor,
|
|
object of Configuration class or returns a copy of default
|
|
configuration passed by the set_default method.
|
|
|
|
:return: The configuration object.
|
|
"""
|
|
if cls._default is not None:
|
|
return copy.deepcopy(cls._default)
|
|
return Configuration()
|
|
|
|
@property
|
|
def logger_file(self):
|
|
"""The logger file.
|
|
|
|
If the logger_file is None, then add stream handler and remove file
|
|
handler. Otherwise, add file handler and remove stream handler.
|
|
|
|
:param value: The logger_file path.
|
|
:type: str
|
|
"""
|
|
return self.__logger_file
|
|
|
|
@logger_file.setter
|
|
def logger_file(self, value):
|
|
"""The logger file.
|
|
|
|
If the logger_file is None, then add stream handler and remove file
|
|
handler. Otherwise, add file handler and remove stream handler.
|
|
|
|
:param value: The logger_file path.
|
|
:type: str
|
|
"""
|
|
self.__logger_file = value
|
|
if self.__logger_file:
|
|
# If set logging file,
|
|
# then add file handler and remove stream handler.
|
|
self.logger_file_handler = logging.FileHandler(self.__logger_file)
|
|
self.logger_file_handler.setFormatter(self.logger_formatter)
|
|
for _, logger in six.iteritems(self.logger):
|
|
logger.addHandler(self.logger_file_handler)
|
|
|
|
@property
|
|
def debug(self):
|
|
"""Debug status
|
|
|
|
:param value: The debug status, True or False.
|
|
:type: bool
|
|
"""
|
|
return self.__debug
|
|
|
|
@debug.setter
|
|
def debug(self, value):
|
|
"""Debug status
|
|
|
|
:param value: The debug status, True or False.
|
|
:type: bool
|
|
"""
|
|
self.__debug = value
|
|
if self.__debug:
|
|
# if debug status is True, turn on debug logging
|
|
for _, logger in six.iteritems(self.logger):
|
|
logger.setLevel(logging.DEBUG)
|
|
# turn on httplib debug
|
|
httplib.HTTPConnection.debuglevel = 1
|
|
else:
|
|
# if debug status is False, turn off debug logging,
|
|
# setting log level to default `logging.WARNING`
|
|
for _, logger in six.iteritems(self.logger):
|
|
logger.setLevel(logging.WARNING)
|
|
# turn off httplib debug
|
|
httplib.HTTPConnection.debuglevel = 0
|
|
|
|
@property
|
|
def logger_format(self):
|
|
"""The logger format.
|
|
|
|
The logger_formatter will be updated when sets logger_format.
|
|
|
|
:param value: The format string.
|
|
:type: str
|
|
"""
|
|
return self.__logger_format
|
|
|
|
@logger_format.setter
|
|
def logger_format(self, value):
|
|
"""The logger format.
|
|
|
|
The logger_formatter will be updated when sets logger_format.
|
|
|
|
:param value: The format string.
|
|
:type: str
|
|
"""
|
|
self.__logger_format = value
|
|
self.logger_formatter = logging.Formatter(self.__logger_format)
|
|
|
|
def get_api_key_with_prefix(self, identifier):
|
|
"""Gets API key (with prefix if set).
|
|
|
|
:param identifier: The identifier of apiKey.
|
|
:return: The token for api key authentication.
|
|
"""
|
|
if self.refresh_api_key_hook is not None:
|
|
self.refresh_api_key_hook(self)
|
|
key = self.api_key.get(identifier)
|
|
if key:
|
|
prefix = self.api_key_prefix.get(identifier)
|
|
if prefix:
|
|
return "%s %s" % (prefix, key)
|
|
else:
|
|
return key
|
|
|
|
def get_basic_auth_token(self):
|
|
"""Gets HTTP basic authentication header (string).
|
|
|
|
:return: The token for basic HTTP authentication.
|
|
"""
|
|
username = ""
|
|
if self.username is not None:
|
|
username = self.username
|
|
password = ""
|
|
if self.password is not None:
|
|
password = self.password
|
|
return urllib3.util.make_headers(basic_auth=username + ":" + password).get(
|
|
"authorization"
|
|
)
|
|
|
|
def auth_settings(self):
|
|
"""Gets Auth Settings dict for api client.
|
|
|
|
:return: The Auth Settings information dict.
|
|
"""
|
|
auth = {}
|
|
return auth
|
|
|
|
def to_debug_report(self):
|
|
"""Gets the essential information for debugging.
|
|
|
|
:return: The report for debugging.
|
|
"""
|
|
return (
|
|
"Python SDK Debug Report:\n"
|
|
"OS: {env}\n"
|
|
"Python Version: {pyversion}\n"
|
|
"Version of the API: 1.0.0\n"
|
|
"SDK Package Version: 1.0.0".format(env=sys.platform, pyversion=sys.version)
|
|
)
|
|
|
|
def get_host_settings(self):
|
|
"""Gets an array of host settings
|
|
|
|
:return: An array of host settings
|
|
"""
|
|
return [
|
|
{
|
|
"url": "/",
|
|
"description": "No description provided",
|
|
}
|
|
]
|
|
|
|
def get_host_from_settings(self, index, variables=None):
|
|
"""Gets host URL based on the index and variables
|
|
:param index: array index of the host settings
|
|
:param variables: hash of variable and the corresponding value
|
|
:return: URL based on host settings
|
|
"""
|
|
variables = {} if variables is None else variables
|
|
servers = self.get_host_settings()
|
|
|
|
try:
|
|
server = servers[index]
|
|
except IndexError:
|
|
raise ValueError(
|
|
"Invalid index {0} when selecting the host settings. "
|
|
"Must be less than {1}".format(index, len(servers))
|
|
)
|
|
|
|
url = server["url"]
|
|
|
|
# go through variables and replace placeholders
|
|
for variable_name, variable in server["variables"].items():
|
|
used_value = variables.get(variable_name, variable["default_value"])
|
|
|
|
if "enum_values" in variable and used_value not in variable["enum_values"]:
|
|
raise ValueError(
|
|
"The variable `{0}` in the host URL has invalid value "
|
|
"{1}. Must be {2}.".format(
|
|
variable_name, variables[variable_name], variable["enum_values"]
|
|
)
|
|
)
|
|
|
|
url = url.replace("{" + variable_name + "}", used_value)
|
|
|
|
return url
|