mirror of
https://github.com/OpenSPG/KAG.git
synced 2025-11-21 04:48:29 +00:00
* add path find * fix find path * spg guided relation extraction * fix dict parse with same key * rename graphalgoclient to graphclient * rename graphalgoclient to graphclient * file reader supports http url * add checkpointer class * parser supports checkpoint * add build * remove incorrect logs * remove logs * update examples * update chain checkpointer * vectorizer batch size set to 32 * add a zodb backended checkpointer * add a zodb backended checkpointer * fix zodb based checkpointer * add thread for zodb IO * fix(common): resolve mutlithread conflict in zodb IO * fix(common): load existing zodb checkpoints * update examples * update examples * fix zodb writer * add docstring * fix jieba version mismatch * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * 1、fix bug in base_table_splitter * 1、fix bug in base_table_splitter * 1、fix bug in default_chain * 增加solver * add kag * update outline splitter * add main test * add op * code refactor * add tools * fix outline splitter * fix outline prompt * graph api pass * commit with page rank * add search api and graph api * add markdown report * fix vectorizer num batch compute * add retry for vectorize model call * update markdown reader * update markdown reader * update pdf reader * raise extractor failure * add default expr * add log * merge jc reader features * rm import * add build * fix zodb based checkpointer * add thread for zodb IO * fix(common): resolve mutlithread conflict in zodb IO * fix(common): load existing zodb checkpoints * update examples * update examples * fix zodb writer * add docstring * fix jieba version mismatch * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * commit kag_config-tc.yaml 1、rename type to register_name 2、put a uniqe & specific name to register_name 3、rename reader to scanner 4、rename parser to reader 5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file 6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor 7、pre-define llm & vectorize_model and refer them in the yaml file Issues to be resolved: 1、examples of event extract & spg extract 2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke. 3、Exceptions such as Debt, account does not exist should be thrown in llm invoke. 4、conf of solver need to be re-examined. * 1、fix bug in base_table_splitter * 1、fix bug in base_table_splitter * 1、fix bug in default_chain * update outline splitter * add main test * add markdown report * code refactor * fix outline splitter * fix outline prompt * update markdown reader * fix vectorizer num batch compute * add retry for vectorize model call * update markdown reader * raise extractor failure * rm parser * run pipeline * add config option of whether to perform llm config check, default to false * fix * recover pdf reader * several components can be null for default chain * 支持完整qa运行 * add if * remove unused code * 使用chunk兜底 * excluded source relation to choose * add generate * default recall 10 * add local memory * 排除相似边 * 增加保护 * 修复并发问题 * add debug logger * 支持topk参数化 * 支持chunk截断和调整spo select 的prompt * 增加查询请求保护 * 增加force_chunk配置 * fix entity linker algorithm * 增加sub query改写 * fix md reader dup in test * fix * merge knext to kag parallel * fix package * 修复指标下跌问题 * scanner update * scanner update * add doc and update example scripts * fix * add bridge to spg server * add format * fix bridge * update conf for baike * disable ckpt for spg server runner * llm invoke error default raise exceptions * chore(version): bump version to X.Y.Z * update default response generation prompt * add method getSummarizationMetrics * fix(common): fix project conf empty error * fix typo * 增加上报信息 * 修改main solver * postprocessor support spg server * 修改solver支持名 * fix language * 修改chunker接口,增加openapi * rename vectorizer to vectorize_model in spg server config * generate_random_string start with gen * add knext llm vector checker * add knext llm vector checker * add knext llm vector checker * solver移除默认值 * udpate yaml and register_name for baike * udpate yaml and register_name for baike * remove config key check * 修复llmmodule * fix knext project * udpate yaml and register_name for examples * udpate yaml and register_name for examples * Revert "udpate yaml and register_name for examples" This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9. * update register name * fix * fix * support multiple resigter names * update component * update reader register names (#183) * fix markdown reader * fix llm client for retry * feat(common): add processed chunk id checkpoint (#185) * update reader register names * add processed chunk id checkpoint * feat(example): add example config (#186) * update reader register names * add processed chunk id checkpoint * add example config file * add max_workers parameter for getSummarizationMetrics to make it faster * add csqa data generation script generate_data.py * commit generated csqa builder and solver data * add csqa basic project files * adjust split_length and num_threads_per_chain to match lightrag settings * ignore ckpt dirs * add csqa evaluation script eval.py * save evaluation scripts summarization_metrics.py and factual_correctness.py * save LightRAG output csqa_lightrag_answers.json * ignore KAG output csqa_kag_answers.json * add README.md for CSQA * fix(solver): fix solver pipeline conf (#191) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * update links and file paths * reformat csqa kag_config.yaml * reformat csqa python files * reformat getSummarizationMetrics and compare_summarization_answers * fix(solver): fix solver config (#192) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * add except * fix typo in csqa README.md * feat(conf): support reinitialize config for call from java side (#199) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * support reinitialize config for java call * revert default response generation prompt * update project list * add README.md for the hotpotqa, 2wiki and musique examples * 增加spo检索 * turn off kag config dump by default * turn off knext schema dump by default * add .gitignore and fix kag_config.yaml * add README.md for the medicine example * add README.md for the supplychain example * bugfix for risk mining * use exact out * refactor(solver): format solver code (#205) * update reader register names * add processed chunk id checkpoint * add example config file * update solver pipeline config * fix project create * fix main solver conf * support reinitialize config for java call * black format --------- Co-authored-by: peilong <peilong.zpl@antgroup.com> Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com> Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com> Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
397 lines
13 KiB
Python
397 lines
13 KiB
Python
# coding: utf-8
|
|
# Copyright 2023 OpenSPG Authors
|
|
#
|
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
|
|
# in compliance with the License. You may obtain a copy of the License at
|
|
#
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
#
|
|
# Unless required by applicable law or agreed to in writing, software distributed under the License
|
|
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
|
|
# or implied.
|
|
|
|
|
|
"""
|
|
knext
|
|
|
|
No description provided (generated by Openapi Generator https://github.com/openapitools/openapi-generator) # noqa: E501
|
|
|
|
The version of the OpenAPI document: 1.0.0
|
|
Generated by: https://openapi-generator.tech
|
|
"""
|
|
|
|
|
|
from __future__ import absolute_import
|
|
|
|
import io
|
|
import json
|
|
import logging
|
|
import re
|
|
import ssl
|
|
|
|
import certifi
|
|
|
|
# python 2 and python 3 compatibility library
|
|
import six
|
|
import urllib3
|
|
from six.moves.urllib.parse import urlencode
|
|
|
|
from knext.common.rest.exceptions import ApiException, ApiValueError
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class RESTResponse(io.IOBase):
|
|
def __init__(self, resp):
|
|
self.urllib3_response = resp
|
|
self.status = resp.status
|
|
self.reason = resp.reason
|
|
self.data = resp.data
|
|
|
|
def getheaders(self):
|
|
"""Returns a dictionary of the response headers."""
|
|
return self.urllib3_response.getheaders()
|
|
|
|
def getheader(self, name, default=None):
|
|
"""Returns a given response header."""
|
|
return self.urllib3_response.getheader(name, default)
|
|
|
|
|
|
class RESTClientObject(object):
|
|
def __init__(self, configuration, pools_size=4, maxsize=None):
|
|
# urllib3.PoolManager will pass all kw parameters to connectionpool
|
|
# https://github.com/shazow/urllib3/blob/f9409436f83aeb79fbaf090181cd81b784f1b8ce/urllib3/poolmanager.py#L75 # noqa: E501
|
|
# https://github.com/shazow/urllib3/blob/f9409436f83aeb79fbaf090181cd81b784f1b8ce/urllib3/connectionpool.py#L680 # noqa: E501
|
|
# maxsize is the number of requests to host that are allowed in parallel # noqa: E501
|
|
# Custom SSL certificates and client certificates: http://urllib3.readthedocs.io/en/latest/advanced-usage.html # noqa: E501
|
|
|
|
# cert_reqs
|
|
if configuration.verify_ssl:
|
|
cert_reqs = ssl.CERT_REQUIRED
|
|
else:
|
|
cert_reqs = ssl.CERT_NONE
|
|
|
|
# ca_certs
|
|
if configuration.ssl_ca_cert:
|
|
ca_certs = configuration.ssl_ca_cert
|
|
else:
|
|
# if not set certificate file, use Mozilla's root certificates.
|
|
ca_certs = certifi.where()
|
|
|
|
addition_pool_args = {}
|
|
if configuration.assert_hostname is not None:
|
|
addition_pool_args[
|
|
"assert_hostname"
|
|
] = configuration.assert_hostname # noqa: E501
|
|
|
|
if configuration.retries is not None:
|
|
addition_pool_args["retries"] = configuration.retries
|
|
|
|
if maxsize is None:
|
|
if configuration.connection_pool_maxsize is not None:
|
|
maxsize = configuration.connection_pool_maxsize
|
|
else:
|
|
maxsize = 4
|
|
|
|
# https pool manager
|
|
if configuration.proxy:
|
|
self.pool_manager = urllib3.ProxyManager(
|
|
num_pools=pools_size,
|
|
maxsize=maxsize,
|
|
cert_reqs=cert_reqs,
|
|
ca_certs=ca_certs,
|
|
cert_file=configuration.cert_file,
|
|
key_file=configuration.key_file,
|
|
proxy_url=configuration.proxy,
|
|
proxy_headers=configuration.proxy_headers,
|
|
**addition_pool_args
|
|
)
|
|
else:
|
|
self.pool_manager = urllib3.PoolManager(
|
|
num_pools=pools_size,
|
|
maxsize=maxsize,
|
|
cert_reqs=cert_reqs,
|
|
ca_certs=ca_certs,
|
|
cert_file=configuration.cert_file,
|
|
key_file=configuration.key_file,
|
|
**addition_pool_args
|
|
)
|
|
|
|
def request(
|
|
self,
|
|
method,
|
|
url,
|
|
query_params=None,
|
|
headers=None,
|
|
body=None,
|
|
post_params=None,
|
|
_preload_content=True,
|
|
_request_timeout=None,
|
|
):
|
|
"""Perform requests.
|
|
|
|
:param method: http request method
|
|
:param url: http request url
|
|
:param query_params: query parameters in the url
|
|
:param headers: http request headers
|
|
:param body: request json body, for `application/json`
|
|
:param post_params: request post parameters,
|
|
`application/x-www-form-urlencoded`
|
|
and `multipart/form-data`
|
|
:param _preload_content: if False, the urllib3.HTTPResponse object will
|
|
be returned without reading/decoding response
|
|
data. Default is True.
|
|
:param _request_timeout: timeout setting for this request. If one
|
|
number provided, it will be total request
|
|
timeout. It can also be a pair (tuple) of
|
|
(connection, read) timeouts.
|
|
"""
|
|
method = method.upper()
|
|
assert method in ["GET", "HEAD", "DELETE", "POST", "PUT", "PATCH", "OPTIONS"]
|
|
|
|
if post_params and body:
|
|
raise ApiValueError(
|
|
"body parameter cannot be used with post_params parameter."
|
|
)
|
|
|
|
post_params = post_params or {}
|
|
headers = headers or {}
|
|
|
|
timeout = None
|
|
if _request_timeout:
|
|
if isinstance(
|
|
_request_timeout, (int,) if six.PY3 else (int, long)
|
|
): # noqa: E501,F821
|
|
timeout = urllib3.Timeout(total=_request_timeout)
|
|
elif isinstance(_request_timeout, tuple) and len(_request_timeout) == 2:
|
|
timeout = urllib3.Timeout(
|
|
connect=_request_timeout[0], read=_request_timeout[1]
|
|
)
|
|
|
|
if "Content-Type" not in headers:
|
|
headers["Content-Type"] = "application/json"
|
|
|
|
try:
|
|
# For `POST`, `PUT`, `PATCH`, `OPTIONS`, `DELETE`
|
|
if method in ["POST", "PUT", "PATCH", "OPTIONS", "DELETE"]:
|
|
if query_params:
|
|
url += "?" + urlencode(query_params)
|
|
if re.search("json", headers["Content-Type"], re.IGNORECASE):
|
|
request_body = None
|
|
if body is not None:
|
|
request_body = json.dumps(body)
|
|
r = self.pool_manager.request(
|
|
method,
|
|
url,
|
|
body=request_body,
|
|
preload_content=_preload_content,
|
|
timeout=timeout,
|
|
headers=headers,
|
|
)
|
|
elif (
|
|
headers["Content-Type"] == "application/x-www-form-urlencoded"
|
|
): # noqa: E501
|
|
r = self.pool_manager.request(
|
|
method,
|
|
url,
|
|
fields=post_params,
|
|
encode_multipart=False,
|
|
preload_content=_preload_content,
|
|
timeout=timeout,
|
|
headers=headers,
|
|
)
|
|
elif headers["Content-Type"] == "multipart/form-data":
|
|
# must del headers['Content-Type'], or the correct
|
|
# Content-Type which generated by urllib3 will be
|
|
# overwritten.
|
|
del headers["Content-Type"]
|
|
r = self.pool_manager.request(
|
|
method,
|
|
url,
|
|
fields=post_params,
|
|
encode_multipart=True,
|
|
preload_content=_preload_content,
|
|
timeout=timeout,
|
|
headers=headers,
|
|
)
|
|
# Pass a `string` parameter directly in the body to support
|
|
# other content types than Json when `body` argument is
|
|
# provided in serialized form
|
|
elif isinstance(body, str) or isinstance(body, bytes):
|
|
request_body = body
|
|
r = self.pool_manager.request(
|
|
method,
|
|
url,
|
|
body=request_body,
|
|
preload_content=_preload_content,
|
|
timeout=timeout,
|
|
headers=headers,
|
|
)
|
|
else:
|
|
# Cannot generate the request from given parameters
|
|
msg = """Cannot prepare a request message for provided
|
|
arguments. Please check that your arguments match
|
|
declared content type."""
|
|
raise ApiException(status=0, reason=msg)
|
|
# For `GET`, `HEAD`
|
|
else:
|
|
r = self.pool_manager.request(
|
|
method,
|
|
url,
|
|
fields=query_params,
|
|
preload_content=_preload_content,
|
|
timeout=timeout,
|
|
headers=headers,
|
|
)
|
|
except urllib3.exceptions.SSLError as e:
|
|
msg = "{0}\n{1}".format(type(e).__name__, str(e))
|
|
raise ApiException(status=0, reason=msg)
|
|
|
|
if _preload_content:
|
|
r = RESTResponse(r)
|
|
|
|
# log response body
|
|
logger.debug("response body: %s", r.data)
|
|
|
|
if not 200 <= r.status <= 299:
|
|
raise ApiException(http_resp=r)
|
|
|
|
return r
|
|
|
|
def GET(
|
|
self,
|
|
url,
|
|
headers=None,
|
|
query_params=None,
|
|
_preload_content=True,
|
|
_request_timeout=None,
|
|
):
|
|
return self.request(
|
|
"GET",
|
|
url,
|
|
headers=headers,
|
|
_preload_content=_preload_content,
|
|
_request_timeout=_request_timeout,
|
|
query_params=query_params,
|
|
)
|
|
|
|
def HEAD(
|
|
self,
|
|
url,
|
|
headers=None,
|
|
query_params=None,
|
|
_preload_content=True,
|
|
_request_timeout=None,
|
|
):
|
|
return self.request(
|
|
"HEAD",
|
|
url,
|
|
headers=headers,
|
|
_preload_content=_preload_content,
|
|
_request_timeout=_request_timeout,
|
|
query_params=query_params,
|
|
)
|
|
|
|
def OPTIONS(
|
|
self,
|
|
url,
|
|
headers=None,
|
|
query_params=None,
|
|
post_params=None,
|
|
body=None,
|
|
_preload_content=True,
|
|
_request_timeout=None,
|
|
):
|
|
return self.request(
|
|
"OPTIONS",
|
|
url,
|
|
headers=headers,
|
|
query_params=query_params,
|
|
post_params=post_params,
|
|
_preload_content=_preload_content,
|
|
_request_timeout=_request_timeout,
|
|
body=body,
|
|
)
|
|
|
|
def DELETE(
|
|
self,
|
|
url,
|
|
headers=None,
|
|
query_params=None,
|
|
body=None,
|
|
_preload_content=True,
|
|
_request_timeout=None,
|
|
):
|
|
return self.request(
|
|
"DELETE",
|
|
url,
|
|
headers=headers,
|
|
query_params=query_params,
|
|
_preload_content=_preload_content,
|
|
_request_timeout=_request_timeout,
|
|
body=body,
|
|
)
|
|
|
|
def POST(
|
|
self,
|
|
url,
|
|
headers=None,
|
|
query_params=None,
|
|
post_params=None,
|
|
body=None,
|
|
_preload_content=True,
|
|
_request_timeout=None,
|
|
):
|
|
return self.request(
|
|
"POST",
|
|
url,
|
|
headers=headers,
|
|
query_params=query_params,
|
|
post_params=post_params,
|
|
_preload_content=_preload_content,
|
|
_request_timeout=_request_timeout,
|
|
body=body,
|
|
)
|
|
|
|
def PUT(
|
|
self,
|
|
url,
|
|
headers=None,
|
|
query_params=None,
|
|
post_params=None,
|
|
body=None,
|
|
_preload_content=True,
|
|
_request_timeout=None,
|
|
):
|
|
return self.request(
|
|
"PUT",
|
|
url,
|
|
headers=headers,
|
|
query_params=query_params,
|
|
post_params=post_params,
|
|
_preload_content=_preload_content,
|
|
_request_timeout=_request_timeout,
|
|
body=body,
|
|
)
|
|
|
|
def PATCH(
|
|
self,
|
|
url,
|
|
headers=None,
|
|
query_params=None,
|
|
post_params=None,
|
|
body=None,
|
|
_preload_content=True,
|
|
_request_timeout=None,
|
|
):
|
|
return self.request(
|
|
"PATCH",
|
|
url,
|
|
headers=headers,
|
|
query_params=query_params,
|
|
post_params=post_params,
|
|
_preload_content=_preload_content,
|
|
_request_timeout=_request_timeout,
|
|
body=body,
|
|
)
|