mirror of
https://github.com/FlagOpen/FlagEmbedding.git
synced 2026-02-12 10:36:49 +00:00
update README
This commit is contained in:
parent
fd919fcd4f
commit
7dbb8e350e
@ -164,17 +164,10 @@ print(similarity)
|
||||
# [0.8462, 0.9091]]
|
||||
```
|
||||
|
||||
|
||||
### Using `langchain`
|
||||
```
|
||||
HuggingfaceInstructorEmbedding
|
||||
```
|
||||
|
||||
|
||||
## Contact
|
||||
If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Peitian Zhang (namespace.pt@gmail.com).
|
||||
|
||||
## Citation
|
||||
If you find this repository useful, please consider giving a star ⭐ and citation
|
||||
```
|
||||
```
|
||||
```
|
||||
@ -1,5 +1,4 @@
|
||||
# Evaluation
|
||||
[TOC]
|
||||
|
||||
LLM-Embedder supports 6 retrieval-augmentation tasks tailored for modern LLMs, including:
|
||||
- Question Answering (qa)
|
||||
@ -7,13 +6,13 @@ LLM-Embedder supports 6 retrieval-augmentation tasks tailored for modern LLMs, i
|
||||
- In-Context Learning (icl)
|
||||
- evaluate with `eval_icl`
|
||||
- Long Conversation (chat)
|
||||
- evaluate with `eval_chat`
|
||||
- evaluate with `eval_msc`
|
||||
- Long-Range Language Modeling (lrlm)
|
||||
- evaluate with `eval_lrlm`
|
||||
- Tool Learning (tool)
|
||||
- evaluate with `eval_tool`
|
||||
- Conversational Search (convsearch)
|
||||
- evaluate with `eval_convsearch`
|
||||
- evaluate with `eval_qrecc`
|
||||
|
||||
## Data
|
||||
The data for evaluation can be downloaded [here](https://huggingface.co/datasets/namespace-Pt/llm-embedder-data/resolve/main/llm-embedder-eval.tar.gz). You should untar the file at anywhere you prefer, e.g. `/data`, which results in a folder `/data/llm-embedder`:
|
||||
@ -21,11 +20,11 @@ The data for evaluation can be downloaded [here](https://huggingface.co/datasets
|
||||
tar -xzvf llm-embedder-eval.tar.gz -C /data
|
||||
```
|
||||
|
||||
**Curretly, the QReCC dataset for conversational search has not been included in the tar.gz file because it's too large. You can refer to [this repository](https://github.com/apple/ml-qrecc) to download it.**
|
||||
**Curretly, the QReCC dataset for conversational search has not been included in this tar.gz file because it's too large. You can refer to [this repository](https://github.com/apple/ml-qrecc) to download it.**
|
||||
|
||||
## Benchmark
|
||||
### Commands
|
||||
Below are commands to run evaluation for different retrieval models. You can replace `eval_popqa` with any of `eval_mmlu`, `eval_icl`, `eval_lrlm`, `eval_chat`, `eval_tool`, and *`eval_convsearch`*.
|
||||
Below are commands to run evaluation for different retrieval models. You can replace `eval_popqa` with any of `eval_mmlu`, `eval_icl`, `eval_lrlm`, `eval_msc`, `eval_tool`, and *`eval_qrecc`*. The results will be logged at `data/results/`.
|
||||
|
||||
*All our evaluation are based on `meta-llama/Llama-2-7b-chat-hf`. To use different language models, e.g. `Qwen/Qwen-7B-Chat`, simply add `--model_name_or_path Qwen/Qwen-7B-Chat` after every command.*
|
||||
|
||||
@ -46,9 +45,6 @@ torchrun --nproc_per_node 8 -m evaluation.eval_popqa --retrieval_method bm25 --d
|
||||
# Contriever
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_popqa --query_encoder facebook/Contriever --dense_metric ip --add_instruction False --data_root /data/llm-embedder
|
||||
|
||||
# E5
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_popqa --query_encoder intfloat/e5-base-v2 --pooling_method mean --version e5 --data_root /data/llm-embedder
|
||||
|
||||
# BGE
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_popqa --query_encoder BAAI/bge-base-en --version bge --data_root /data/llm-embedder
|
||||
|
||||
@ -111,5 +107,5 @@ All the following results are based on `meta-llama/Llama-27b-chat-hf` with `torc
|
||||
|AAR|0.4826|0.4792|0.5938|14.6999|6.1528|0.42|0.2877|
|
||||
|LLMRetriever|0.4625|0.2506|0.6262|14.4746|6.1750|0.1321|0.0234|
|
||||
|APIRetriever|0.4625|0.2488|0.5945|14.7834|6.1833|0.8017|0.1137|
|
||||
|LLM-Embedder (ours)|**0.4904**|**0.5052**|**0.6288**|**13.4832**|**6.0972**|**0.8645**|**0.5053**|
|
||||
|LLM-Embedder (ours)|**0.4903**|**0.5052**|**0.6288**|**13.4832**|**6.0972**|**0.8645**|**0.5053**|
|
||||
|
||||
|
||||
@ -1,7 +1,5 @@
|
||||
# Fine-tuning
|
||||
|
||||
[TOC]
|
||||
|
||||
## Data
|
||||
The following data format is universally used for training & evaluating retrievers and rerankers.
|
||||
|
||||
@ -37,7 +35,14 @@ There are several important arguments for training:
|
||||
|
||||
The meaning and usage of other arguments can be inspected from [code](../src/retrieval/args.py) or running `python run_dense.py --help` from command line.
|
||||
|
||||
### LLM-Embedder (Multi-Task Fine-Tune)
|
||||
```bash
|
||||
bash scripts/llm-embedder.sh
|
||||
```
|
||||
|
||||
### Single Task Fine-Tune
|
||||
Below we provide commands to fine-tune a retriever on a single task.
|
||||
|
||||
#### QA
|
||||
```bash
|
||||
torchrun --nproc_per_node=8 run_dense.py \
|
||||
@ -117,11 +122,6 @@ torchrun --nproc_per_node=8 run_dense.py \
|
||||
--key_template '{text}'
|
||||
```
|
||||
|
||||
### Multi-Task Fine-Tune (LLM Embedder)
|
||||
```bash
|
||||
bash scripts/llm-embedder.sh
|
||||
```
|
||||
|
||||
### Mine Negatives
|
||||
```bash
|
||||
# BGE
|
||||
@ -140,57 +140,6 @@ torchrun --nproc_per_node 8 -m evaluation.eval_retrieval \
|
||||
--save_name bm25
|
||||
```
|
||||
|
||||
|
||||
## Reranker
|
||||
### QA
|
||||
```bash
|
||||
# Collate keys for evaluating reranking performance
|
||||
torchrun --nproc_per_node=8 -m evaluation.eval_retrieval \
|
||||
--eval_data llm-embedder:qa/nq/test.json \
|
||||
--corpus llm-embedder:qa/nq/corpus.json \
|
||||
--key_max_length 128 \
|
||||
--query_max_length 32 \
|
||||
--metrics nq collate_key \
|
||||
--save_name bge
|
||||
|
||||
# Train NQ cross-encoder
|
||||
torchrun --nproc_per_node 8 run_ranker.py \
|
||||
--ranker microsoft/deberta-v3-large \
|
||||
--key_max_length 128 \
|
||||
--query_max_length 32 \
|
||||
--output_dir data/outputs/nq/bge-crossenc \
|
||||
--train_data llm-embedder:qa/nq/train.neg.bge.json \
|
||||
--eval_data llm-embedder:qa/nq/test.key.bge.json \
|
||||
--corpus llm-embedder:qa/nq/corpus.json \
|
||||
--learning_rate 5e-6 \
|
||||
--num_train_epochs 5 \
|
||||
--evaluation_strategy epoch \
|
||||
--save_strategy epoch \
|
||||
--save_total_limit 2 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--metrics nq \
|
||||
--metric_for_best_model recall@10
|
||||
|
||||
# Score train data with the ranker
|
||||
files=( data/outputs/nq/bge-crossenc/*/ranker )
|
||||
torchrun --nproc_per_node 8 run_ranker.py \
|
||||
--ranker ${files[0]} \
|
||||
--key_max_length 128 \
|
||||
--query_max_length 32 \
|
||||
--eval_data llm-embedder:qa/nq/train.neg.bge.json \
|
||||
--corpus llm-embedder:qa/nq/corpus.json \
|
||||
--metrics mrr recall collate_score \
|
||||
--save_name deberta-large
|
||||
```
|
||||
|
||||
### 3-Iter Pipeline
|
||||
We replicate the 3-iter pipeline for enhancing retriever's performance from [SimLM paper](https://arxiv.org/abs/2207.02578).
|
||||
|
||||
```bash
|
||||
bash scripts/3iter-msmarco.sh
|
||||
bash scripts/3iter-nq.sh
|
||||
```
|
||||
|
||||
## LM Scoring
|
||||
Score positives and negatives in `eval_data` with $p(o|q,k)$ where $o$ is the desired output, $q$ is the query, and $k$ is a key (could be positive or negative). This requires `answers` field in `train_data`.
|
||||
|
||||
@ -199,6 +148,13 @@ torchrun --nproc_per_node=8 run_lm_score.py --eval_data llm-embedder:qa/msmarco/
|
||||
```
|
||||
Results will be saved at `llm-embedder:qa/msmarco/train.scored.llama2-7b.json`
|
||||
|
||||
### 3-Iter Pipeline
|
||||
We replicate the 3-iter pipeline for enhancing retriever's performance from [SimLM paper](https://arxiv.org/abs/2207.02578).
|
||||
|
||||
```bash
|
||||
bash scripts/3iter-msmarco.sh
|
||||
bash scripts/3iter-nq.sh
|
||||
```
|
||||
|
||||
## Note
|
||||
- `transformers==4.30.0` raises error when using deepspeed schedulerconfig
|
||||
|
||||
@ -7,7 +7,7 @@ torchrun --nproc_per_node=8 run_dense.py --train_data \
|
||||
llm-embedder:lrlm/arxiv/train.128tok.scored.llama2-7b-chat.json \
|
||||
llm-embedder:lrlm/books3/train.128tok.scored.llama2-7b-chat.json \
|
||||
llm-embedder:lrlm/codeparrot/train.128tok.scored.llama2-7b-chat.json \
|
||||
llm-embedder:qa/msmarco/train.wl.json \
|
||||
llm-embedder:qa/msmarco/train.hard.json \
|
||||
llm-embedder:qa/nq/train.neg.bge.scored.deberta-large.json \
|
||||
llm-embedder:tool/toolbench/train.hardneg.json \
|
||||
llm-embedder:tool/toolbench/train.hardneg.json \
|
||||
@ -30,12 +30,12 @@ do
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_mmlu --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_popqa --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_qa --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_chat --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_msc --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_tool --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_lrlm --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --eval_data llm-embedder:lrlm/books3/test.json --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_lrlm --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --eval_data llm-embedder:lrlm/arxiv/test.json --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_lrlm --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --eval_data llm-embedder:lrlm/codeparrot/test.json --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_lrlm --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --eval_data llm-embedder:lrlm/pg19/test.json --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_icl --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_convsearch --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
torchrun --nproc_per_node 8 -m evaluation.eval_qrecc --query_encoder /share/peitian/Code/LlamaRetriever/data/outputs/$output/$model/encoder --version $version
|
||||
done
|
||||
|
||||
12
README.md
12
README.md
@ -28,7 +28,7 @@
|
||||
</h4>
|
||||
|
||||
|
||||
[English](README.md) | [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)
|
||||
[English](README.md) | [中文](./README_zh.md)
|
||||
|
||||
FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search.
|
||||
And it also can be used in vector databases for LLMs.
|
||||
@ -53,11 +53,11 @@ And it also can be used in vector databases for LLMs.
|
||||
|
||||
`bge` is short for `BAAI general embedding`.
|
||||
|
||||
| Model | Language | | Description | query instruction for retrieval\* |
|
||||
| Model | Language | | Description | query instruction for retrieval [1] |
|
||||
|:-------------------------------|:--------:| :--------:| :--------:|:--------:|
|
||||
| [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) | a unified embedding model that supports diverse needs of retrieval augmentation for LLMs | See [README](./FlagEmbedding/llm_embedder/README.md) |
|
||||
| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient \** | |
|
||||
| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient \** | |
|
||||
| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
|
||||
| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
|
||||
| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
|
||||
| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
|
||||
| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
|
||||
@ -72,9 +72,9 @@ And it also can be used in vector databases for LLMs.
|
||||
| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` |
|
||||
|
||||
|
||||
\*: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages.
|
||||
[1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages.
|
||||
|
||||
\**: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.
|
||||
[2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.
|
||||
For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.
|
||||
|
||||
All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI.
|
||||
|
||||
10
README_zh.md
10
README_zh.md
@ -48,11 +48,11 @@
|
||||
|
||||
|
||||
## Model List
|
||||
| Model | Language | | Description | query instruction for retrieval\* |
|
||||
| Model | Language | | Description | query instruction for retrieval [1] |
|
||||
|:-------------------------------|:--------:| :--------:| :--------:|:--------:|
|
||||
| [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [推理](./FlagEmbedding/llm_embedder/README.md) [微调](./FlagEmbedding/llm_embedder/README.md) | 专为大语言模型各种检索增强任务设计的模型 | 详见 [README](./FlagEmbedding/llm_embedder/README.md) |
|
||||
| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [推理](#usage-for-reranker) [微调](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | 交叉编码器模型,精度比向量模型更高但推理效率较低 \** | |
|
||||
| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [推理](#usage-for-reranker) [微调](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | 交叉编码器模型,精度比向量模型更高但推理效率较低 \** | |
|
||||
| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [推理](#usage-for-reranker) [微调](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | 交叉编码器模型,精度比向量模型更高但推理效率较低 [2] | |
|
||||
| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [推理](#usage-for-reranker) [微调](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | 交叉编码器模型,精度比向量模型更高但推理效率较低 [2] | |
|
||||
| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [推理](#usage-for-embedding-model) [微调](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | 1.5版本,相似度分布更加合理 | `Represent this sentence for searching relevant passages: ` |
|
||||
| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [推理](#usage-for-embedding-model) [微调](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | 1.5版本,相似度分布更加合理 | `Represent this sentence for searching relevant passages: ` |
|
||||
| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [推理](#usage-for-embedding-model) [微调](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | 1.5版本,相似度分布更加合理 | `Represent this sentence for searching relevant passages: ` |
|
||||
@ -67,9 +67,9 @@
|
||||
| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [推理](#usage-for-embedding-model) [微调](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | small-scale模型 | `为这个句子生成表示以用于检索相关文章:` |
|
||||
|
||||
|
||||
\*: 如果您需要为一个**简短的查询搜索相关的文档**,您需要在查询中添加指令;在其他情况下,不需要指令,直接使用原始查询即可。在任何情况下,您都**不需要为候选文档增加指令**。
|
||||
[1\]: 如果您需要为一个**简短的查询搜索相关的文档**,您需要在查询中添加指令;在其他情况下,不需要指令,直接使用原始查询即可。在任何情况下,您都**不需要为候选文档增加指令**。
|
||||
|
||||
\**: 不同于向量模型输出向量,reranker交叉编码器使用问题和文档作为输入,直接输出相似度。为了平衡准确率和时间成本,交叉编码器一般用于对其他简单模型检索到的top-k文档进行重排序。例如,使用bge向量模型检索前100个相关文档,然后使用bge reranker对前100个文档重新排序,得到最终的top-3结果。
|
||||
[2\]: 不同于向量模型输出向量,reranker交叉编码器使用问题和文档作为输入,直接输出相似度。为了平衡准确率和时间成本,交叉编码器一般用于对其他简单模型检索到的top-k文档进行重排序。例如,使用bge向量模型检索前100个相关文档,然后使用bge reranker对前100个文档重新排序,得到最终的top-3结果。
|
||||
|
||||
## 常见问题
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user