KAG/knext/common/rest/configuration.py

396 lines
13 KiB
Python
Raw Normal View History

refactor(all): kag v0.6 (#174) * add path find * fix find path * spg guided relation extraction * fix dict parse with same key * rename graphalgoclient to graphclient * rename graphalgoclient to graphclient * file reader supports http url * add checkpointer class * parser supports checkpoint * add build * remove incorrect logs * remove logs * update examples * update chain checkpointer * vectorizer batch size set to 32 * add a zodb backended checkpointer * add a zodb backended checkpointer * fix zodb based checkpointer * add thread for zodb IO * fix(common): resolve mutlithread conflict in zodb IO * fix(common): load existing zodb checkpoints * update examples * update examples * fix zodb writer * add docstring * fix jieba version mismatch * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * 1、fix bug in base_table_splitter * 1、fix bug in base_table_splitter * 1、fix bug in default_chain * 增加solver * add kag * update outline splitter * add main test * add op * code refactor * add tools * fix outline splitter * fix outline prompt * graph api pass * commit with page rank * add search api and graph api * add markdown report * fix vectorizer num batch compute * add retry for vectorize model call * update markdown reader * update markdown reader * update pdf reader * raise extractor failure * add default expr * add log * merge jc reader features * rm import * add build * fix zodb based checkpointer * add thread for zodb IO * fix(common): resolve mutlithread conflict in zodb IO * fix(common): load existing zodb checkpoints * update examples * update examples * fix zodb writer * add docstring * fix jieba version mismatch * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * 1、fix bug in base_table_splitter * 1、fix bug in base_table_splitter * 1、fix bug in default_chain * update outline splitter * add main test * add markdown report * code refactor * fix outline splitter * fix outline prompt * update markdown reader * fix vectorizer num batch compute * add retry for vectorize model call * update markdown reader * raise extractor failure * rm parser * run pipeline * add config option of whether to perform llm config check, default to false * fix * recover pdf reader * several components can be null for default chain * 支持完整qa运行 * add if * remove unused code * 使用chunk兜底 * excluded source relation to choose * add generate * default recall 10 * add local memory * 排除相似边 * 增加保护 * 修复并发问题 * add debug logger * 支持topk参数化 * 支持chunk截断和调整spo select 的prompt * 增加查询请求保护 * 增加force_chunk配置 * fix entity linker algorithm * 增加sub query改写 * fix md reader dup in test * fix * merge knext to kag parallel * fix package * 修复指标下跌问题 * scanner update * scanner update * add doc and update example scripts * fix * add bridge to spg server * add format * fix bridge * update conf for baike * disable ckpt for spg server runner * llm invoke error default raise exceptions * chore(version): bump version to X.Y.Z * update default response generation prompt * add method getSummarizationMetrics * fix(common): fix project conf empty error * fix typo * 增加上报信息 * 修改main solver * postprocessor support spg server * 修改solver支持名 * fix language * 修改chunker接口,增加openapi * rename vectorizer to vectorize_model in spg server config * generate_random_string start with gen * add knext llm vector checker * add knext llm vector checker * add knext llm vector checker * solver移除默认值 * udpate yaml and register_name for baike * udpate yaml and register_name for baike * remove config key check * 修复llmmodule * fix knext project * udpate yaml and register_name for examples * udpate yaml and register_name for examples * Revert "udpate yaml and register_name for examples" This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9. * update register name * fix * fix * support multiple resigter names * update component * update reader register names (#183) * fix markdown reader * fix llm client for retry * feat(common): add processed chunk id checkpoint (#185) * update reader register names * add processed chunk id checkpoint * feat(example): add example config (#186) * update reader register names * add processed chunk id checkpoint * add example config file * add max_workers parameter for getSummarizationMetrics to make it faster * add csqa data generation script generate_data.py * commit generated csqa builder and solver data * add csqa basic project files * adjust split_length and num_threads_per_chain to match lightrag settings * ignore ckpt dirs * add csqa evaluation script eval.py * save evaluation scripts summarization_metrics.py and factual_correctness.py * save LightRAG output csqa_lightrag_answers.json * ignore KAG output csqa_kag_answers.json * add README.md for CSQA * fix(solver): fix solver pipeline conf (#191) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * update links and file paths * reformat csqa kag_config.yaml * reformat csqa python files * reformat getSummarizationMetrics and compare_summarization_answers * fix(solver): fix solver config (#192) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * add except * fix typo in csqa README.md * feat(conf): support reinitialize config for call from java side (#199) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * support reinitialize config for java call * revert default response generation prompt * update project list * add README.md for the hotpotqa, 2wiki and musique examples * 增加spo检索 * turn off kag config dump by default * turn off knext schema dump by default * add .gitignore and fix kag_config.yaml * add README.md for the medicine example * add README.md for the supplychain example * bugfix for risk mining * use exact out * refactor(solver): format solver code (#205) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * support reinitialize config for java call * black format --------- Co-authored-by: peilong <peilong.zpl@antgroup.com> Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com> Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com> Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
2025-01-03 17:10:51 +08:00
# coding: utf-8
# Copyright 2023 OpenSPG Authors
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied.
"""
knext
No description provided (generated by Openapi Generator https://github.com/openapitools/openapi-generator) # noqa: E501
The version of the OpenAPI document: 1.0.0
Generated by: https://openapi-generator.tech
"""
from __future__ import absolute_import
import copy
import logging
import multiprocessing
import os
from pathlib import Path
import sys
import yaml
import six
import urllib3
from six.moves import http_client as httplib
from knext.common.env import env
class Configuration(object):
"""NOTE: This class is auto generated by OpenAPI Generator
Ref: https://openapi-generator.tech
Do not edit the class manually.
:param host: Base url
:param api_key: Dict to store API key(s).
Each entry in the dict specifies an API key.
The dict key is the name of the security scheme in the OAS specification.
The dict value is the API key secret.
:param api_key_prefix: Dict to store API prefix (e.g. Bearer)
The dict key is the name of the security scheme in the OAS specification.
The dict value is an API key prefix when generating the auth data.
:param username: Username for HTTP basic authentication
:param password: Password for HTTP basic authentication
:param discard_unknown_keys: Boolean value indicating whether to discard
unknown properties. A server may send a response that includes additional
properties that are not known by the client in the following scenarios:
1. The OpenAPI document is incomplete, i.e. it does not match the server
implementation.
2. The client was generated using an older version of the OpenAPI document
and the server has been upgraded since then.
If a schema in the OpenAPI document defines the additionalProperties attribute,
then all undeclared properties received by the server are injected into the
additional properties map. In that case, there are undeclared properties, and
nothing to discard.
"""
_default = None
def __init__(
self,
host=None,
api_key=None,
api_key_prefix=None,
username=None,
password=None,
discard_unknown_keys=False,
):
"""Constructor"""
self.host = host or os.getenv("KAG_PROJECT_HOST_ADDR") or env.host_addr
"""Default Base url
"""
self.temp_folder_path = None
"""Temp file folder for downloading files
"""
# Authentication Settings
self.api_key = {}
if api_key:
self.api_key = api_key
"""dict to store API key(s)
"""
self.api_key_prefix = {}
if api_key_prefix:
self.api_key_prefix = api_key_prefix
"""dict to store API prefix (e.g. Bearer)
"""
self.refresh_api_key_hook = None
"""function hook to refresh API key if expired
"""
self.username = username
"""Username for HTTP basic authentication
"""
self.password = password
"""Password for HTTP basic authentication
"""
self.discard_unknown_keys = discard_unknown_keys
self.logger = {}
"""Logging Settings
"""
self.logger["package_logger"] = logging.getLogger("rest")
self.logger["urllib3_logger"] = logging.getLogger("urllib3")
self.logger_format = "%(asctime)s %(levelname)s %(message)s"
"""Log format
"""
self.logger_stream_handler = None
"""Log stream handler
"""
self.logger_file_handler = None
"""Log file handler
"""
self.logger_file = None
"""Debug file location
"""
self.debug = False
"""Debug switch
"""
self.verify_ssl = True
"""SSL/TLS verification
Set this to false to skip verifying SSL certificate when calling API
from https server.
"""
self.ssl_ca_cert = None
"""Set this to customize the certificate file to verify the peer.
"""
self.cert_file = None
"""client certificate file
"""
self.key_file = None
"""client key file
"""
self.assert_hostname = None
"""Set this to True/False to enable/disable SSL hostname verification.
"""
self.connection_pool_maxsize = multiprocessing.cpu_count() * 5
"""urllib3 connection pool's maximum number of connections saved
per pool. urllib3 uses 1 connection as default value, but this is
not the best value when you are making a lot of possibly parallel
requests to the same host, which is often the case here.
cpu_count * 5 is used as default value to increase performance.
"""
self.proxy = None
"""Proxy URL
"""
self.proxy_headers = None
"""Proxy headers
"""
self.safe_chars_for_path_param = ""
"""Safe chars for path_param
"""
self.retries = None
"""Adding retries to override urllib3 default value 3
"""
# Disable client side validation
self.client_side_validation = True
def __deepcopy__(self, memo):
cls = self.__class__
result = cls.__new__(cls)
memo[id(self)] = result
for k, v in self.__dict__.items():
if k not in ("logger", "logger_file_handler"):
setattr(result, k, copy.deepcopy(v, memo))
# shallow copy of loggers
result.logger = copy.copy(self.logger)
# use setters to configure loggers
result.logger_file = self.logger_file
result.debug = self.debug
return result
def __setattr__(self, name, value):
object.__setattr__(self, name, value)
@classmethod
def set_default(cls, default):
"""Set default instance of configuration.
It stores default configuration, which can be
returned by get_default_copy method.
:param default: object of Configuration
"""
cls._default = copy.deepcopy(default)
@classmethod
def get_default_copy(cls):
"""Return new instance of configuration.
This method returns newly created, based on default constructor,
object of Configuration class or returns a copy of default
configuration passed by the set_default method.
:return: The configuration object.
"""
if cls._default is not None:
return copy.deepcopy(cls._default)
return Configuration()
@property
def logger_file(self):
"""The logger file.
If the logger_file is None, then add stream handler and remove file
handler. Otherwise, add file handler and remove stream handler.
:param value: The logger_file path.
:type: str
"""
return self.__logger_file
@logger_file.setter
def logger_file(self, value):
"""The logger file.
If the logger_file is None, then add stream handler and remove file
handler. Otherwise, add file handler and remove stream handler.
:param value: The logger_file path.
:type: str
"""
self.__logger_file = value
if self.__logger_file:
# If set logging file,
# then add file handler and remove stream handler.
self.logger_file_handler = logging.FileHandler(self.__logger_file)
self.logger_file_handler.setFormatter(self.logger_formatter)
for _, logger in six.iteritems(self.logger):
logger.addHandler(self.logger_file_handler)
@property
def debug(self):
"""Debug status
:param value: The debug status, True or False.
:type: bool
"""
return self.__debug
@debug.setter
def debug(self, value):
"""Debug status
:param value: The debug status, True or False.
:type: bool
"""
self.__debug = value
if self.__debug:
# if debug status is True, turn on debug logging
for _, logger in six.iteritems(self.logger):
logger.setLevel(logging.DEBUG)
# turn on httplib debug
httplib.HTTPConnection.debuglevel = 1
else:
# if debug status is False, turn off debug logging,
# setting log level to default `logging.WARNING`
for _, logger in six.iteritems(self.logger):
logger.setLevel(logging.WARNING)
# turn off httplib debug
httplib.HTTPConnection.debuglevel = 0
@property
def logger_format(self):
"""The logger format.
The logger_formatter will be updated when sets logger_format.
:param value: The format string.
:type: str
"""
return self.__logger_format
@logger_format.setter
def logger_format(self, value):
"""The logger format.
The logger_formatter will be updated when sets logger_format.
:param value: The format string.
:type: str
"""
self.__logger_format = value
self.logger_formatter = logging.Formatter(self.__logger_format)
def get_api_key_with_prefix(self, identifier):
"""Gets API key (with prefix if set).
:param identifier: The identifier of apiKey.
:return: The token for api key authentication.
"""
if self.refresh_api_key_hook is not None:
self.refresh_api_key_hook(self)
key = self.api_key.get(identifier)
if key:
prefix = self.api_key_prefix.get(identifier)
if prefix:
return "%s %s" % (prefix, key)
else:
return key
def get_basic_auth_token(self):
"""Gets HTTP basic authentication header (string).
:return: The token for basic HTTP authentication.
"""
username = ""
if self.username is not None:
username = self.username
password = ""
if self.password is not None:
password = self.password
return urllib3.util.make_headers(basic_auth=username + ":" + password).get(
"authorization"
)
def auth_settings(self):
"""Gets Auth Settings dict for api client.
:return: The Auth Settings information dict.
"""
auth = {}
return auth
def to_debug_report(self):
"""Gets the essential information for debugging.
:return: The report for debugging.
"""
return (
"Python SDK Debug Report:\n"
"OS: {env}\n"
"Python Version: {pyversion}\n"
"Version of the API: 1.0.0\n"
"SDK Package Version: 1.0.0".format(env=sys.platform, pyversion=sys.version)
)
def get_host_settings(self):
"""Gets an array of host settings
:return: An array of host settings
"""
return [
{
"url": "/",
"description": "No description provided",
}
]
def get_host_from_settings(self, index, variables=None):
"""Gets host URL based on the index and variables
:param index: array index of the host settings
:param variables: hash of variable and the corresponding value
:return: URL based on host settings
"""
variables = {} if variables is None else variables
servers = self.get_host_settings()
try:
server = servers[index]
except IndexError:
raise ValueError(
"Invalid index {0} when selecting the host settings. "
"Must be less than {1}".format(index, len(servers))
)
url = server["url"]
# go through variables and replace placeholders
for variable_name, variable in server["variables"].items():
used_value = variables.get(variable_name, variable["default_value"])
if "enum_values" in variable and used_value not in variable["enum_values"]:
raise ValueError(
"The variable `{0}` in the host URL has invalid value "
"{1}. Must be {2}.".format(
variable_name, variables[variable_name], variable["enum_values"]
)
)
url = url.replace("{" + variable_name + "}", used_value)
return url