refactor(all): kag v0.6 (#174)
* add path find
* fix find path
* spg guided relation extraction
* fix dict parse with same key
* rename graphalgoclient to graphclient
* rename graphalgoclient to graphclient
* file reader supports http url
* add checkpointer class
* parser supports checkpoint
* add build
* remove incorrect logs
* remove logs
* update examples
* update chain checkpointer
* vectorizer batch size set to 32
* add a zodb backended checkpointer
* add a zodb backended checkpointer
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* 增加solver
* add kag
* update outline splitter
* add main test
* add op
* code refactor
* add tools
* fix outline splitter
* fix outline prompt
* graph api pass
* commit with page rank
* add search api and graph api
* add markdown report
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* update markdown reader
* update pdf reader
* raise extractor failure
* add default expr
* add log
* merge jc reader features
* rm import
* add build
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* update outline splitter
* add main test
* add markdown report
* code refactor
* fix outline splitter
* fix outline prompt
* update markdown reader
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* raise extractor failure
* rm parser
* run pipeline
* add config option of whether to perform llm config check, default to false
* fix
* recover pdf reader
* several components can be null for default chain
* 支持完整qa运行
* add if
* remove unused code
* 使用chunk兜底
* excluded source relation to choose
* add generate
* default recall 10
* add local memory
* 排除相似边
* 增加保护
* 修复并发问题
* add debug logger
* 支持topk参数化
* 支持chunk截断和调整spo select 的prompt
* 增加查询请求保护
* 增加force_chunk配置
* fix entity linker algorithm
* 增加sub query改写
* fix md reader dup in test
* fix
* merge knext to kag parallel
* fix package
* 修复指标下跌问题
* scanner update
* scanner update
* add doc and update example scripts
* fix
* add bridge to spg server
* add format
* fix bridge
* update conf for baike
* disable ckpt for spg server runner
* llm invoke error default raise exceptions
* chore(version): bump version to X.Y.Z
* update default response generation prompt
* add method getSummarizationMetrics
* fix(common): fix project conf empty error
* fix typo
* 增加上报信息
* 修改main solver
* postprocessor support spg server
* 修改solver支持名
* fix language
* 修改chunker接口,增加openapi
* rename vectorizer to vectorize_model in spg server config
* generate_random_string start with gen
* add knext llm vector checker
* add knext llm vector checker
* add knext llm vector checker
* solver移除默认值
* udpate yaml and register_name for baike
* udpate yaml and register_name for baike
* remove config key check
* 修复llmmodule
* fix knext project
* udpate yaml and register_name for examples
* udpate yaml and register_name for examples
* Revert "udpate yaml and register_name for examples"
This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9.
* update register name
* fix
* fix
* support multiple resigter names
* update component
* update reader register names (#183)
* fix markdown reader
* fix llm client for retry
* feat(common): add processed chunk id checkpoint (#185)
* update reader register names
* add processed chunk id checkpoint
* feat(example): add example config (#186)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* add max_workers parameter for getSummarizationMetrics to make it faster
* add csqa data generation script generate_data.py
* commit generated csqa builder and solver data
* add csqa basic project files
* adjust split_length and num_threads_per_chain to match lightrag settings
* ignore ckpt dirs
* add csqa evaluation script eval.py
* save evaluation scripts summarization_metrics.py and factual_correctness.py
* save LightRAG output csqa_lightrag_answers.json
* ignore KAG output csqa_kag_answers.json
* add README.md for CSQA
* fix(solver): fix solver pipeline conf (#191)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* update links and file paths
* reformat csqa kag_config.yaml
* reformat csqa python files
* reformat getSummarizationMetrics and compare_summarization_answers
* fix(solver): fix solver config (#192)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* add except
* fix typo in csqa README.md
* feat(conf): support reinitialize config for call from java side (#199)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* revert default response generation prompt
* update project list
* add README.md for the hotpotqa, 2wiki and musique examples
* 增加spo检索
* turn off kag config dump by default
* turn off knext schema dump by default
* add .gitignore and fix kag_config.yaml
* add README.md for the medicine example
* add README.md for the supplychain example
* bugfix for risk mining
* use exact out
* refactor(solver): format solver code (#205)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* black format
---------
Co-authored-by: peilong <peilong.zpl@antgroup.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>
Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
2025-01-03 17:10:51 +08:00
|
|
|
# -*- coding: utf-8 -*-
|
|
|
|
# Copyright 2023 OpenSPG Authors
|
|
|
|
#
|
|
|
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
|
|
|
|
# in compliance with the License. You may obtain a copy of the License at
|
|
|
|
#
|
|
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
#
|
|
|
|
# Unless required by applicable law or agreed to in writing, software distributed under the License
|
|
|
|
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
|
|
|
|
# or implied.
|
|
|
|
import os
|
|
|
|
from typing import List, Dict
|
|
|
|
|
|
|
|
import knext.common.cache
|
|
|
|
from knext.common.base.client import Client
|
|
|
|
from knext.common.rest import ApiClient, Configuration
|
|
|
|
from knext.schema import rest
|
|
|
|
from knext.schema.model.base import BaseSpgType, AlterOperationEnum, SpgTypeEnum
|
|
|
|
from knext.schema.model.relation import Relation
|
2025-06-16 19:22:30 +08:00
|
|
|
from knext.schema.model.spg_type import IndexType
|
|
|
|
from knext.schema.rest import BasicType
|
refactor(all): kag v0.6 (#174)
* add path find
* fix find path
* spg guided relation extraction
* fix dict parse with same key
* rename graphalgoclient to graphclient
* rename graphalgoclient to graphclient
* file reader supports http url
* add checkpointer class
* parser supports checkpoint
* add build
* remove incorrect logs
* remove logs
* update examples
* update chain checkpointer
* vectorizer batch size set to 32
* add a zodb backended checkpointer
* add a zodb backended checkpointer
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* 增加solver
* add kag
* update outline splitter
* add main test
* add op
* code refactor
* add tools
* fix outline splitter
* fix outline prompt
* graph api pass
* commit with page rank
* add search api and graph api
* add markdown report
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* update markdown reader
* update pdf reader
* raise extractor failure
* add default expr
* add log
* merge jc reader features
* rm import
* add build
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* update outline splitter
* add main test
* add markdown report
* code refactor
* fix outline splitter
* fix outline prompt
* update markdown reader
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* raise extractor failure
* rm parser
* run pipeline
* add config option of whether to perform llm config check, default to false
* fix
* recover pdf reader
* several components can be null for default chain
* 支持完整qa运行
* add if
* remove unused code
* 使用chunk兜底
* excluded source relation to choose
* add generate
* default recall 10
* add local memory
* 排除相似边
* 增加保护
* 修复并发问题
* add debug logger
* 支持topk参数化
* 支持chunk截断和调整spo select 的prompt
* 增加查询请求保护
* 增加force_chunk配置
* fix entity linker algorithm
* 增加sub query改写
* fix md reader dup in test
* fix
* merge knext to kag parallel
* fix package
* 修复指标下跌问题
* scanner update
* scanner update
* add doc and update example scripts
* fix
* add bridge to spg server
* add format
* fix bridge
* update conf for baike
* disable ckpt for spg server runner
* llm invoke error default raise exceptions
* chore(version): bump version to X.Y.Z
* update default response generation prompt
* add method getSummarizationMetrics
* fix(common): fix project conf empty error
* fix typo
* 增加上报信息
* 修改main solver
* postprocessor support spg server
* 修改solver支持名
* fix language
* 修改chunker接口,增加openapi
* rename vectorizer to vectorize_model in spg server config
* generate_random_string start with gen
* add knext llm vector checker
* add knext llm vector checker
* add knext llm vector checker
* solver移除默认值
* udpate yaml and register_name for baike
* udpate yaml and register_name for baike
* remove config key check
* 修复llmmodule
* fix knext project
* udpate yaml and register_name for examples
* udpate yaml and register_name for examples
* Revert "udpate yaml and register_name for examples"
This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9.
* update register name
* fix
* fix
* support multiple resigter names
* update component
* update reader register names (#183)
* fix markdown reader
* fix llm client for retry
* feat(common): add processed chunk id checkpoint (#185)
* update reader register names
* add processed chunk id checkpoint
* feat(example): add example config (#186)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* add max_workers parameter for getSummarizationMetrics to make it faster
* add csqa data generation script generate_data.py
* commit generated csqa builder and solver data
* add csqa basic project files
* adjust split_length and num_threads_per_chain to match lightrag settings
* ignore ckpt dirs
* add csqa evaluation script eval.py
* save evaluation scripts summarization_metrics.py and factual_correctness.py
* save LightRAG output csqa_lightrag_answers.json
* ignore KAG output csqa_kag_answers.json
* add README.md for CSQA
* fix(solver): fix solver pipeline conf (#191)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* update links and file paths
* reformat csqa kag_config.yaml
* reformat csqa python files
* reformat getSummarizationMetrics and compare_summarization_answers
* fix(solver): fix solver config (#192)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* add except
* fix typo in csqa README.md
* feat(conf): support reinitialize config for call from java side (#199)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* revert default response generation prompt
* update project list
* add README.md for the hotpotqa, 2wiki and musique examples
* 增加spo检索
* turn off kag config dump by default
* turn off knext schema dump by default
* add .gitignore and fix kag_config.yaml
* add README.md for the medicine example
* add README.md for the supplychain example
* bugfix for risk mining
* use exact out
* refactor(solver): format solver code (#205)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* black format
---------
Co-authored-by: peilong <peilong.zpl@antgroup.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>
Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
2025-01-03 17:10:51 +08:00
|
|
|
|
|
|
|
cache = knext.common.cache.SchemaCache()
|
|
|
|
|
|
|
|
|
|
|
|
CHUNK_TYPE = "Chunk"
|
2025-06-06 16:26:14 +08:00
|
|
|
TABLE_TYPE = "Table"
|
2025-04-17 17:23:52 +08:00
|
|
|
TITLE_TYPE = "Title"
|
refactor(all): kag v0.6 (#174)
* add path find
* fix find path
* spg guided relation extraction
* fix dict parse with same key
* rename graphalgoclient to graphclient
* rename graphalgoclient to graphclient
* file reader supports http url
* add checkpointer class
* parser supports checkpoint
* add build
* remove incorrect logs
* remove logs
* update examples
* update chain checkpointer
* vectorizer batch size set to 32
* add a zodb backended checkpointer
* add a zodb backended checkpointer
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* 增加solver
* add kag
* update outline splitter
* add main test
* add op
* code refactor
* add tools
* fix outline splitter
* fix outline prompt
* graph api pass
* commit with page rank
* add search api and graph api
* add markdown report
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* update markdown reader
* update pdf reader
* raise extractor failure
* add default expr
* add log
* merge jc reader features
* rm import
* add build
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* update outline splitter
* add main test
* add markdown report
* code refactor
* fix outline splitter
* fix outline prompt
* update markdown reader
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* raise extractor failure
* rm parser
* run pipeline
* add config option of whether to perform llm config check, default to false
* fix
* recover pdf reader
* several components can be null for default chain
* 支持完整qa运行
* add if
* remove unused code
* 使用chunk兜底
* excluded source relation to choose
* add generate
* default recall 10
* add local memory
* 排除相似边
* 增加保护
* 修复并发问题
* add debug logger
* 支持topk参数化
* 支持chunk截断和调整spo select 的prompt
* 增加查询请求保护
* 增加force_chunk配置
* fix entity linker algorithm
* 增加sub query改写
* fix md reader dup in test
* fix
* merge knext to kag parallel
* fix package
* 修复指标下跌问题
* scanner update
* scanner update
* add doc and update example scripts
* fix
* add bridge to spg server
* add format
* fix bridge
* update conf for baike
* disable ckpt for spg server runner
* llm invoke error default raise exceptions
* chore(version): bump version to X.Y.Z
* update default response generation prompt
* add method getSummarizationMetrics
* fix(common): fix project conf empty error
* fix typo
* 增加上报信息
* 修改main solver
* postprocessor support spg server
* 修改solver支持名
* fix language
* 修改chunker接口,增加openapi
* rename vectorizer to vectorize_model in spg server config
* generate_random_string start with gen
* add knext llm vector checker
* add knext llm vector checker
* add knext llm vector checker
* solver移除默认值
* udpate yaml and register_name for baike
* udpate yaml and register_name for baike
* remove config key check
* 修复llmmodule
* fix knext project
* udpate yaml and register_name for examples
* udpate yaml and register_name for examples
* Revert "udpate yaml and register_name for examples"
This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9.
* update register name
* fix
* fix
* support multiple resigter names
* update component
* update reader register names (#183)
* fix markdown reader
* fix llm client for retry
* feat(common): add processed chunk id checkpoint (#185)
* update reader register names
* add processed chunk id checkpoint
* feat(example): add example config (#186)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* add max_workers parameter for getSummarizationMetrics to make it faster
* add csqa data generation script generate_data.py
* commit generated csqa builder and solver data
* add csqa basic project files
* adjust split_length and num_threads_per_chain to match lightrag settings
* ignore ckpt dirs
* add csqa evaluation script eval.py
* save evaluation scripts summarization_metrics.py and factual_correctness.py
* save LightRAG output csqa_lightrag_answers.json
* ignore KAG output csqa_kag_answers.json
* add README.md for CSQA
* fix(solver): fix solver pipeline conf (#191)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* update links and file paths
* reformat csqa kag_config.yaml
* reformat csqa python files
* reformat getSummarizationMetrics and compare_summarization_answers
* fix(solver): fix solver config (#192)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* add except
* fix typo in csqa README.md
* feat(conf): support reinitialize config for call from java side (#199)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* revert default response generation prompt
* update project list
* add README.md for the hotpotqa, 2wiki and musique examples
* 增加spo检索
* turn off kag config dump by default
* turn off knext schema dump by default
* add .gitignore and fix kag_config.yaml
* add README.md for the medicine example
* add README.md for the supplychain example
* bugfix for risk mining
* use exact out
* refactor(solver): format solver code (#205)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* black format
---------
Co-authored-by: peilong <peilong.zpl@antgroup.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>
Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
2025-01-03 17:10:51 +08:00
|
|
|
OTHER_TYPE = "Others"
|
|
|
|
TEXT_TYPE = "Text"
|
|
|
|
INTEGER_TYPE = "Integer"
|
|
|
|
FLOAT_TYPE = "Float"
|
|
|
|
BASIC_TYPES = [TEXT_TYPE, INTEGER_TYPE, FLOAT_TYPE]
|
|
|
|
|
|
|
|
|
|
|
|
class SchemaSession:
|
|
|
|
def __init__(self, client, project_id):
|
|
|
|
self._alter_spg_types: List[BaseSpgType] = []
|
|
|
|
self._rest_client = client
|
|
|
|
self._project_id = project_id
|
|
|
|
|
|
|
|
self._spg_types = {}
|
|
|
|
self.__spg_types = {}
|
|
|
|
self._init_spg_types()
|
|
|
|
|
|
|
|
def _init_spg_types(self):
|
|
|
|
"""Query project schema and init SPG types in session."""
|
|
|
|
project_schema = self._rest_client.schema_query_project_schema_get(
|
|
|
|
self._project_id
|
|
|
|
)
|
|
|
|
for spg_type in project_schema.spg_types:
|
|
|
|
spg_type_name = spg_type.basic_info.name.name
|
|
|
|
type_class = BaseSpgType.by_type_enum(spg_type.spg_type_enum)
|
|
|
|
if spg_type.spg_type_enum == SpgTypeEnum.Concept:
|
|
|
|
self._spg_types[spg_type_name] = type_class(
|
|
|
|
name=spg_type_name,
|
|
|
|
hypernym_predicate=spg_type.concept_layer_config.hypernym_predicate,
|
|
|
|
rest_model=spg_type,
|
|
|
|
)
|
|
|
|
else:
|
|
|
|
self._spg_types[spg_type_name] = type_class(
|
|
|
|
name=spg_type_name, rest_model=spg_type
|
|
|
|
)
|
|
|
|
|
|
|
|
@property
|
|
|
|
def spg_types(self) -> Dict[str, BaseSpgType]:
|
|
|
|
return self._spg_types
|
|
|
|
|
|
|
|
def get(self, spg_type_name) -> BaseSpgType:
|
|
|
|
"""Get SPG type by name from project schema."""
|
|
|
|
spg_type = self._spg_types.get(spg_type_name)
|
|
|
|
if spg_type is None:
|
|
|
|
spg_type = self.__spg_types.get(spg_type_name)
|
|
|
|
if spg_type is None:
|
|
|
|
raise ValueError(f"{spg_type_name} is not existed")
|
|
|
|
else:
|
|
|
|
return self.__spg_types.get(spg_type_name)
|
|
|
|
return self._spg_types.get(spg_type_name)
|
|
|
|
|
|
|
|
def create_type(self, spg_type: BaseSpgType):
|
|
|
|
"""Add an SPG type in session with `CREATE` operation."""
|
|
|
|
spg_type.alter_operation = AlterOperationEnum.Create
|
|
|
|
self.__spg_types[spg_type.name] = spg_type
|
|
|
|
self._alter_spg_types.append(spg_type)
|
|
|
|
return self
|
|
|
|
|
|
|
|
def update_type(self, spg_type: BaseSpgType):
|
|
|
|
"""Add an SPG type in session with `UPDATE` operation."""
|
|
|
|
spg_type.alter_operation = AlterOperationEnum.Update
|
|
|
|
self._alter_spg_types.append(spg_type)
|
|
|
|
return self
|
|
|
|
|
|
|
|
def delete_type(self, spg_type: BaseSpgType):
|
|
|
|
"""Add an SPG type in session with `DELETE` operation."""
|
|
|
|
spg_type.alter_operation = AlterOperationEnum.Delete
|
|
|
|
self._alter_spg_types.append(spg_type)
|
|
|
|
return self
|
|
|
|
|
|
|
|
def commit(self):
|
|
|
|
"""Commit all altered schemas to server."""
|
|
|
|
schema_draft = []
|
|
|
|
for spg_type in self._alter_spg_types:
|
|
|
|
for prop in spg_type.properties.values():
|
|
|
|
if prop.object_spg_type is None:
|
|
|
|
object_spg_type = self.get(prop.object_type_name)
|
|
|
|
prop.object_spg_type = object_spg_type.spg_type_enum
|
|
|
|
for sub_prop in prop.sub_properties.values():
|
|
|
|
if sub_prop.object_spg_type is None:
|
|
|
|
object_spg_type = self.get(sub_prop.object_type_name)
|
|
|
|
sub_prop.object_spg_type = object_spg_type.spg_type_enum
|
|
|
|
for rel in spg_type.relations.values():
|
|
|
|
if rel.is_dynamic is None:
|
|
|
|
rel.is_dynamic = False
|
|
|
|
if rel.object_spg_type is None:
|
|
|
|
object_spg_type = self.get(rel.object_type_name)
|
|
|
|
rel.object_spg_type = object_spg_type.spg_type_enum
|
|
|
|
for sub_prop in rel.sub_properties.values():
|
|
|
|
if sub_prop.object_spg_type is None:
|
|
|
|
object_spg_type = self.get(sub_prop.object_type_name)
|
|
|
|
sub_prop.object_spg_type = object_spg_type.spg_type_enum
|
|
|
|
schema_draft.append(spg_type.to_rest())
|
|
|
|
if len(schema_draft) == 0:
|
|
|
|
return
|
|
|
|
|
|
|
|
request = rest.SchemaAlterRequest(
|
|
|
|
project_id=self._project_id, schema_draft=rest.SchemaDraft(schema_draft)
|
|
|
|
)
|
|
|
|
key = "KNEXT_DEBUG_DUMP_SCHEMA"
|
|
|
|
dump_flag = os.getenv(key)
|
|
|
|
if dump_flag is not None and dump_flag.strip() == "1":
|
|
|
|
print(request)
|
|
|
|
else:
|
|
|
|
print(f"Committing schema: set {key}=1 to dump the schema")
|
|
|
|
self._rest_client.schema_alter_schema_post(schema_alter_request=request)
|
|
|
|
|
|
|
|
|
|
|
|
class SchemaClient(Client):
|
|
|
|
""" """
|
|
|
|
|
|
|
|
def __init__(self, host_addr: str = None, project_id: str = None):
|
|
|
|
super().__init__(host_addr, project_id)
|
|
|
|
self._session = None
|
|
|
|
self._rest_client: rest.SchemaApi = rest.SchemaApi(
|
|
|
|
api_client=ApiClient(configuration=Configuration(host=host_addr))
|
|
|
|
)
|
2025-06-27 19:31:06 +08:00
|
|
|
|
refactor(all): kag v0.6 (#174)
* add path find
* fix find path
* spg guided relation extraction
* fix dict parse with same key
* rename graphalgoclient to graphclient
* rename graphalgoclient to graphclient
* file reader supports http url
* add checkpointer class
* parser supports checkpoint
* add build
* remove incorrect logs
* remove logs
* update examples
* update chain checkpointer
* vectorizer batch size set to 32
* add a zodb backended checkpointer
* add a zodb backended checkpointer
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* 增加solver
* add kag
* update outline splitter
* add main test
* add op
* code refactor
* add tools
* fix outline splitter
* fix outline prompt
* graph api pass
* commit with page rank
* add search api and graph api
* add markdown report
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* update markdown reader
* update pdf reader
* raise extractor failure
* add default expr
* add log
* merge jc reader features
* rm import
* add build
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* update outline splitter
* add main test
* add markdown report
* code refactor
* fix outline splitter
* fix outline prompt
* update markdown reader
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* raise extractor failure
* rm parser
* run pipeline
* add config option of whether to perform llm config check, default to false
* fix
* recover pdf reader
* several components can be null for default chain
* 支持完整qa运行
* add if
* remove unused code
* 使用chunk兜底
* excluded source relation to choose
* add generate
* default recall 10
* add local memory
* 排除相似边
* 增加保护
* 修复并发问题
* add debug logger
* 支持topk参数化
* 支持chunk截断和调整spo select 的prompt
* 增加查询请求保护
* 增加force_chunk配置
* fix entity linker algorithm
* 增加sub query改写
* fix md reader dup in test
* fix
* merge knext to kag parallel
* fix package
* 修复指标下跌问题
* scanner update
* scanner update
* add doc and update example scripts
* fix
* add bridge to spg server
* add format
* fix bridge
* update conf for baike
* disable ckpt for spg server runner
* llm invoke error default raise exceptions
* chore(version): bump version to X.Y.Z
* update default response generation prompt
* add method getSummarizationMetrics
* fix(common): fix project conf empty error
* fix typo
* 增加上报信息
* 修改main solver
* postprocessor support spg server
* 修改solver支持名
* fix language
* 修改chunker接口,增加openapi
* rename vectorizer to vectorize_model in spg server config
* generate_random_string start with gen
* add knext llm vector checker
* add knext llm vector checker
* add knext llm vector checker
* solver移除默认值
* udpate yaml and register_name for baike
* udpate yaml and register_name for baike
* remove config key check
* 修复llmmodule
* fix knext project
* udpate yaml and register_name for examples
* udpate yaml and register_name for examples
* Revert "udpate yaml and register_name for examples"
This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9.
* update register name
* fix
* fix
* support multiple resigter names
* update component
* update reader register names (#183)
* fix markdown reader
* fix llm client for retry
* feat(common): add processed chunk id checkpoint (#185)
* update reader register names
* add processed chunk id checkpoint
* feat(example): add example config (#186)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* add max_workers parameter for getSummarizationMetrics to make it faster
* add csqa data generation script generate_data.py
* commit generated csqa builder and solver data
* add csqa basic project files
* adjust split_length and num_threads_per_chain to match lightrag settings
* ignore ckpt dirs
* add csqa evaluation script eval.py
* save evaluation scripts summarization_metrics.py and factual_correctness.py
* save LightRAG output csqa_lightrag_answers.json
* ignore KAG output csqa_kag_answers.json
* add README.md for CSQA
* fix(solver): fix solver pipeline conf (#191)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* update links and file paths
* reformat csqa kag_config.yaml
* reformat csqa python files
* reformat getSummarizationMetrics and compare_summarization_answers
* fix(solver): fix solver config (#192)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* add except
* fix typo in csqa README.md
* feat(conf): support reinitialize config for call from java side (#199)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* revert default response generation prompt
* update project list
* add README.md for the hotpotqa, 2wiki and musique examples
* 增加spo检索
* turn off kag config dump by default
* turn off knext schema dump by default
* add .gitignore and fix kag_config.yaml
* add README.md for the medicine example
* add README.md for the supplychain example
* bugfix for risk mining
* use exact out
* refactor(solver): format solver code (#205)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* black format
---------
Co-authored-by: peilong <peilong.zpl@antgroup.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>
Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
2025-01-03 17:10:51 +08:00
|
|
|
def query_spg_type(self, spg_type_name: str) -> BaseSpgType:
|
|
|
|
"""Query SPG type by name."""
|
|
|
|
rest_model = self._rest_client.schema_query_spg_type_get(spg_type_name)
|
|
|
|
type_class = BaseSpgType.by_type_enum(f"{rest_model.spg_type_enum}")
|
|
|
|
|
|
|
|
if rest_model.spg_type_enum == SpgTypeEnum.Concept:
|
|
|
|
return type_class(
|
|
|
|
name=spg_type_name,
|
|
|
|
hypernym_predicate=rest_model.concept_layer_config.hypernym_predicate,
|
|
|
|
rest_model=rest_model,
|
|
|
|
)
|
|
|
|
else:
|
|
|
|
return type_class(name=spg_type_name, rest_model=rest_model)
|
|
|
|
|
|
|
|
def query_relation(
|
|
|
|
self, subject_name: str, predicate_name: str, object_name: str
|
|
|
|
) -> Relation:
|
|
|
|
"""Query relation type by s_p_o name."""
|
|
|
|
rest_model = self._rest_client.schema_query_relation_get(
|
|
|
|
subject_name, predicate_name, object_name
|
|
|
|
)
|
|
|
|
return Relation(
|
|
|
|
name=predicate_name, object_type_name=object_name, rest_model=rest_model
|
|
|
|
)
|
|
|
|
|
|
|
|
def create_session(self):
|
|
|
|
"""Create session for altering schema."""
|
|
|
|
schema_session = cache.get(self._project_id)
|
|
|
|
if not schema_session:
|
|
|
|
schema_session = SchemaSession(self._rest_client, self._project_id)
|
|
|
|
cache.put(self._project_id, schema_session)
|
|
|
|
return schema_session
|
|
|
|
|
|
|
|
def load(self):
|
|
|
|
schema_session = self.create_session()
|
|
|
|
schema = {
|
|
|
|
k.split(".")[-1]: v
|
|
|
|
for k, v in schema_session.spg_types.items()
|
|
|
|
if v.spg_type_enum
|
2025-06-27 19:31:06 +08:00
|
|
|
in [
|
|
|
|
SpgTypeEnum.Concept,
|
|
|
|
SpgTypeEnum.Entity,
|
|
|
|
SpgTypeEnum.Event,
|
|
|
|
SpgTypeEnum.Index,
|
|
|
|
]
|
refactor(all): kag v0.6 (#174)
* add path find
* fix find path
* spg guided relation extraction
* fix dict parse with same key
* rename graphalgoclient to graphclient
* rename graphalgoclient to graphclient
* file reader supports http url
* add checkpointer class
* parser supports checkpoint
* add build
* remove incorrect logs
* remove logs
* update examples
* update chain checkpointer
* vectorizer batch size set to 32
* add a zodb backended checkpointer
* add a zodb backended checkpointer
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* 增加solver
* add kag
* update outline splitter
* add main test
* add op
* code refactor
* add tools
* fix outline splitter
* fix outline prompt
* graph api pass
* commit with page rank
* add search api and graph api
* add markdown report
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* update markdown reader
* update pdf reader
* raise extractor failure
* add default expr
* add log
* merge jc reader features
* rm import
* add build
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* update outline splitter
* add main test
* add markdown report
* code refactor
* fix outline splitter
* fix outline prompt
* update markdown reader
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* raise extractor failure
* rm parser
* run pipeline
* add config option of whether to perform llm config check, default to false
* fix
* recover pdf reader
* several components can be null for default chain
* 支持完整qa运行
* add if
* remove unused code
* 使用chunk兜底
* excluded source relation to choose
* add generate
* default recall 10
* add local memory
* 排除相似边
* 增加保护
* 修复并发问题
* add debug logger
* 支持topk参数化
* 支持chunk截断和调整spo select 的prompt
* 增加查询请求保护
* 增加force_chunk配置
* fix entity linker algorithm
* 增加sub query改写
* fix md reader dup in test
* fix
* merge knext to kag parallel
* fix package
* 修复指标下跌问题
* scanner update
* scanner update
* add doc and update example scripts
* fix
* add bridge to spg server
* add format
* fix bridge
* update conf for baike
* disable ckpt for spg server runner
* llm invoke error default raise exceptions
* chore(version): bump version to X.Y.Z
* update default response generation prompt
* add method getSummarizationMetrics
* fix(common): fix project conf empty error
* fix typo
* 增加上报信息
* 修改main solver
* postprocessor support spg server
* 修改solver支持名
* fix language
* 修改chunker接口,增加openapi
* rename vectorizer to vectorize_model in spg server config
* generate_random_string start with gen
* add knext llm vector checker
* add knext llm vector checker
* add knext llm vector checker
* solver移除默认值
* udpate yaml and register_name for baike
* udpate yaml and register_name for baike
* remove config key check
* 修复llmmodule
* fix knext project
* udpate yaml and register_name for examples
* udpate yaml and register_name for examples
* Revert "udpate yaml and register_name for examples"
This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9.
* update register name
* fix
* fix
* support multiple resigter names
* update component
* update reader register names (#183)
* fix markdown reader
* fix llm client for retry
* feat(common): add processed chunk id checkpoint (#185)
* update reader register names
* add processed chunk id checkpoint
* feat(example): add example config (#186)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* add max_workers parameter for getSummarizationMetrics to make it faster
* add csqa data generation script generate_data.py
* commit generated csqa builder and solver data
* add csqa basic project files
* adjust split_length and num_threads_per_chain to match lightrag settings
* ignore ckpt dirs
* add csqa evaluation script eval.py
* save evaluation scripts summarization_metrics.py and factual_correctness.py
* save LightRAG output csqa_lightrag_answers.json
* ignore KAG output csqa_kag_answers.json
* add README.md for CSQA
* fix(solver): fix solver pipeline conf (#191)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* update links and file paths
* reformat csqa kag_config.yaml
* reformat csqa python files
* reformat getSummarizationMetrics and compare_summarization_answers
* fix(solver): fix solver config (#192)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* add except
* fix typo in csqa README.md
* feat(conf): support reinitialize config for call from java side (#199)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* revert default response generation prompt
* update project list
* add README.md for the hotpotqa, 2wiki and musique examples
* 增加spo检索
* turn off kag config dump by default
* turn off knext schema dump by default
* add .gitignore and fix kag_config.yaml
* add README.md for the medicine example
* add README.md for the supplychain example
* bugfix for risk mining
* use exact out
* refactor(solver): format solver code (#205)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* black format
---------
Co-authored-by: peilong <peilong.zpl@antgroup.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>
Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
2025-01-03 17:10:51 +08:00
|
|
|
}
|
|
|
|
return schema
|
|
|
|
|
2025-06-16 19:22:30 +08:00
|
|
|
def extract_types(self, language):
|
refactor(all): kag v0.6 (#174)
* add path find
* fix find path
* spg guided relation extraction
* fix dict parse with same key
* rename graphalgoclient to graphclient
* rename graphalgoclient to graphclient
* file reader supports http url
* add checkpointer class
* parser supports checkpoint
* add build
* remove incorrect logs
* remove logs
* update examples
* update chain checkpointer
* vectorizer batch size set to 32
* add a zodb backended checkpointer
* add a zodb backended checkpointer
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* 增加solver
* add kag
* update outline splitter
* add main test
* add op
* code refactor
* add tools
* fix outline splitter
* fix outline prompt
* graph api pass
* commit with page rank
* add search api and graph api
* add markdown report
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* update markdown reader
* update pdf reader
* raise extractor failure
* add default expr
* add log
* merge jc reader features
* rm import
* add build
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* update outline splitter
* add main test
* add markdown report
* code refactor
* fix outline splitter
* fix outline prompt
* update markdown reader
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* raise extractor failure
* rm parser
* run pipeline
* add config option of whether to perform llm config check, default to false
* fix
* recover pdf reader
* several components can be null for default chain
* 支持完整qa运行
* add if
* remove unused code
* 使用chunk兜底
* excluded source relation to choose
* add generate
* default recall 10
* add local memory
* 排除相似边
* 增加保护
* 修复并发问题
* add debug logger
* 支持topk参数化
* 支持chunk截断和调整spo select 的prompt
* 增加查询请求保护
* 增加force_chunk配置
* fix entity linker algorithm
* 增加sub query改写
* fix md reader dup in test
* fix
* merge knext to kag parallel
* fix package
* 修复指标下跌问题
* scanner update
* scanner update
* add doc and update example scripts
* fix
* add bridge to spg server
* add format
* fix bridge
* update conf for baike
* disable ckpt for spg server runner
* llm invoke error default raise exceptions
* chore(version): bump version to X.Y.Z
* update default response generation prompt
* add method getSummarizationMetrics
* fix(common): fix project conf empty error
* fix typo
* 增加上报信息
* 修改main solver
* postprocessor support spg server
* 修改solver支持名
* fix language
* 修改chunker接口,增加openapi
* rename vectorizer to vectorize_model in spg server config
* generate_random_string start with gen
* add knext llm vector checker
* add knext llm vector checker
* add knext llm vector checker
* solver移除默认值
* udpate yaml and register_name for baike
* udpate yaml and register_name for baike
* remove config key check
* 修复llmmodule
* fix knext project
* udpate yaml and register_name for examples
* udpate yaml and register_name for examples
* Revert "udpate yaml and register_name for examples"
This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9.
* update register name
* fix
* fix
* support multiple resigter names
* update component
* update reader register names (#183)
* fix markdown reader
* fix llm client for retry
* feat(common): add processed chunk id checkpoint (#185)
* update reader register names
* add processed chunk id checkpoint
* feat(example): add example config (#186)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* add max_workers parameter for getSummarizationMetrics to make it faster
* add csqa data generation script generate_data.py
* commit generated csqa builder and solver data
* add csqa basic project files
* adjust split_length and num_threads_per_chain to match lightrag settings
* ignore ckpt dirs
* add csqa evaluation script eval.py
* save evaluation scripts summarization_metrics.py and factual_correctness.py
* save LightRAG output csqa_lightrag_answers.json
* ignore KAG output csqa_kag_answers.json
* add README.md for CSQA
* fix(solver): fix solver pipeline conf (#191)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* update links and file paths
* reformat csqa kag_config.yaml
* reformat csqa python files
* reformat getSummarizationMetrics and compare_summarization_answers
* fix(solver): fix solver config (#192)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* add except
* fix typo in csqa README.md
* feat(conf): support reinitialize config for call from java side (#199)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* revert default response generation prompt
* update project list
* add README.md for the hotpotqa, 2wiki and musique examples
* 增加spo检索
* turn off kag config dump by default
* turn off knext schema dump by default
* add .gitignore and fix kag_config.yaml
* add README.md for the medicine example
* add README.md for the supplychain example
* bugfix for risk mining
* use exact out
* refactor(solver): format solver code (#205)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* black format
---------
Co-authored-by: peilong <peilong.zpl@antgroup.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>
Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
2025-01-03 17:10:51 +08:00
|
|
|
schema = self.load()
|
2025-06-16 19:22:30 +08:00
|
|
|
types = []
|
|
|
|
for t in schema.keys():
|
|
|
|
schema_type = schema[t]
|
|
|
|
if isinstance(schema_type, IndexType) or isinstance(schema_type, BasicType):
|
|
|
|
continue
|
|
|
|
if t in [CHUNK_TYPE] + BASIC_TYPES:
|
|
|
|
continue
|
2025-06-27 19:31:06 +08:00
|
|
|
types.append(
|
|
|
|
schema_type.name_zh if language == "zh" else schema_type.name_en
|
|
|
|
)
|
refactor(all): kag v0.6 (#174)
* add path find
* fix find path
* spg guided relation extraction
* fix dict parse with same key
* rename graphalgoclient to graphclient
* rename graphalgoclient to graphclient
* file reader supports http url
* add checkpointer class
* parser supports checkpoint
* add build
* remove incorrect logs
* remove logs
* update examples
* update chain checkpointer
* vectorizer batch size set to 32
* add a zodb backended checkpointer
* add a zodb backended checkpointer
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* 增加solver
* add kag
* update outline splitter
* add main test
* add op
* code refactor
* add tools
* fix outline splitter
* fix outline prompt
* graph api pass
* commit with page rank
* add search api and graph api
* add markdown report
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* update markdown reader
* update pdf reader
* raise extractor failure
* add default expr
* add log
* merge jc reader features
* rm import
* add build
* fix zodb based checkpointer
* add thread for zodb IO
* fix(common): resolve mutlithread conflict in zodb IO
* fix(common): load existing zodb checkpoints
* update examples
* update examples
* fix zodb writer
* add docstring
* fix jieba version mismatch
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* commit kag_config-tc.yaml
1、rename type to register_name
2、put a uniqe & specific name to register_name
3、rename reader to scanner
4、rename parser to reader
5、rename num_parallel to num_parallel_file, rename chain_level_num_paralle to num_parallel_chain_of_file
6、rename kag_extractor to schema_free_extractor, schema_base_extractor to schema_constraint_extractor
7、pre-define llm & vectorize_model and refer them in the yaml file
Issues to be resolved:
1、examples of event extract & spg extract
2、statistic of indexer, such as nums of nodes & edges extracted, ratio of llm invoke.
3、Exceptions such as Debt, account does not exist should be thrown in llm invoke.
4、conf of solver need to be re-examined.
* 1、fix bug in base_table_splitter
* 1、fix bug in base_table_splitter
* 1、fix bug in default_chain
* update outline splitter
* add main test
* add markdown report
* code refactor
* fix outline splitter
* fix outline prompt
* update markdown reader
* fix vectorizer num batch compute
* add retry for vectorize model call
* update markdown reader
* raise extractor failure
* rm parser
* run pipeline
* add config option of whether to perform llm config check, default to false
* fix
* recover pdf reader
* several components can be null for default chain
* 支持完整qa运行
* add if
* remove unused code
* 使用chunk兜底
* excluded source relation to choose
* add generate
* default recall 10
* add local memory
* 排除相似边
* 增加保护
* 修复并发问题
* add debug logger
* 支持topk参数化
* 支持chunk截断和调整spo select 的prompt
* 增加查询请求保护
* 增加force_chunk配置
* fix entity linker algorithm
* 增加sub query改写
* fix md reader dup in test
* fix
* merge knext to kag parallel
* fix package
* 修复指标下跌问题
* scanner update
* scanner update
* add doc and update example scripts
* fix
* add bridge to spg server
* add format
* fix bridge
* update conf for baike
* disable ckpt for spg server runner
* llm invoke error default raise exceptions
* chore(version): bump version to X.Y.Z
* update default response generation prompt
* add method getSummarizationMetrics
* fix(common): fix project conf empty error
* fix typo
* 增加上报信息
* 修改main solver
* postprocessor support spg server
* 修改solver支持名
* fix language
* 修改chunker接口,增加openapi
* rename vectorizer to vectorize_model in spg server config
* generate_random_string start with gen
* add knext llm vector checker
* add knext llm vector checker
* add knext llm vector checker
* solver移除默认值
* udpate yaml and register_name for baike
* udpate yaml and register_name for baike
* remove config key check
* 修复llmmodule
* fix knext project
* udpate yaml and register_name for examples
* udpate yaml and register_name for examples
* Revert "udpate yaml and register_name for examples"
This reverts commit b3fa5ca9ba749e501133ac67bd8746027ab839d9.
* update register name
* fix
* fix
* support multiple resigter names
* update component
* update reader register names (#183)
* fix markdown reader
* fix llm client for retry
* feat(common): add processed chunk id checkpoint (#185)
* update reader register names
* add processed chunk id checkpoint
* feat(example): add example config (#186)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* add max_workers parameter for getSummarizationMetrics to make it faster
* add csqa data generation script generate_data.py
* commit generated csqa builder and solver data
* add csqa basic project files
* adjust split_length and num_threads_per_chain to match lightrag settings
* ignore ckpt dirs
* add csqa evaluation script eval.py
* save evaluation scripts summarization_metrics.py and factual_correctness.py
* save LightRAG output csqa_lightrag_answers.json
* ignore KAG output csqa_kag_answers.json
* add README.md for CSQA
* fix(solver): fix solver pipeline conf (#191)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* update links and file paths
* reformat csqa kag_config.yaml
* reformat csqa python files
* reformat getSummarizationMetrics and compare_summarization_answers
* fix(solver): fix solver config (#192)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* add except
* fix typo in csqa README.md
* feat(conf): support reinitialize config for call from java side (#199)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* revert default response generation prompt
* update project list
* add README.md for the hotpotqa, 2wiki and musique examples
* 增加spo检索
* turn off kag config dump by default
* turn off knext schema dump by default
* add .gitignore and fix kag_config.yaml
* add README.md for the medicine example
* add README.md for the supplychain example
* bugfix for risk mining
* use exact out
* refactor(solver): format solver code (#205)
* update reader register names
* add processed chunk id checkpoint
* add example config file
* update solver pipeline config
* fix project create
* fix main solver conf
* support reinitialize config for java call
* black format
---------
Co-authored-by: peilong <peilong.zpl@antgroup.com>
Co-authored-by: 锦呈 <zhangxinhong.zxh@antgroup.com>
Co-authored-by: zhengke.gzk <zhengke.gzk@antgroup.com>
Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
2025-01-03 17:10:51 +08:00
|
|
|
return types
|
2025-04-17 17:23:52 +08:00
|
|
|
|
|
|
|
def extract_types_zh_mapping(self):
|
|
|
|
schema = self.load()
|
|
|
|
schema_mapping = {
|
2025-06-27 19:31:06 +08:00
|
|
|
t: schema[t].name_zh
|
|
|
|
for t in schema.keys()
|
|
|
|
if t not in [CHUNK_TYPE] + BASIC_TYPES
|
2025-04-17 17:23:52 +08:00
|
|
|
}
|
|
|
|
return schema_mapping
|