update readme

2026-01-06 04:01:35 +00:00 · 2024-10-29 16:34:56 +08:00 · 2024-10-29 16:34:56 +08:00 · 34e9c21654
commit 34e9c21654
parent 9e93a8678f
2 changed files with 15 additions and 11 deletions
--- a/examples/README.md
+++ b/examples/README.md
@ -22,7 +22,7 @@ pip install -e .

 # 3. Inference

-We have provided the inference code for two models, the **embedder** and the **reranker**. These can be loaded using `FlagAutoModel` and `FlagAutoReranker`, respectively. For more detailed instructions on their use, please refer to the documentation for the [embedder](https://github.com/hanhainebula/FlagEmbedding/blob/new-flagembedding-v1/examples/inference/embedder) and [reranker](https://github.com/hanhainebula/FlagEmbedding/blob/new-flagembedding-v1/examples/inference/reranker).
+We have provided the inference code for two types of models: the **embedder** and the **reranker**. These can be loaded using `FlagAutoModel` and `FlagAutoReranker`, respectively. For more detailed instructions on their use, please refer to the documentation for the [embedder](https://github.com/hanhainebula/FlagEmbedding/blob/new-flagembedding-v1/examples/inference/embedder) and [reranker](https://github.com/hanhainebula/FlagEmbedding/blob/new-flagembedding-v1/examples/inference/reranker).

 ## 1. Embedder

@ -67,7 +67,7 @@ print(scores)

 # 4. Finetune

-We support the finetune of various BGE series models, including bge-large-en-v1.5, bge-m3, bge-en-icl, bge-reranker-v2-m3, bge-reranker-v2-gemma, and bge-reranker-v2-minicpm-layerwise, etc. Here, we take the basic models bge-en-large-v1.5 and bge-reranker-large as examples. For more details, please see the [embedder](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/finetune/embedder) and [reranker](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/finetune/reranker) sections.
+We support fine-tuning a variety of BGE series models, including `bge-large-en-v1.5`, `bge-m3`, bge-`en-icl`, `bge-reranker-v2-m3`, `bge-reranker-v2-gemma`, and `bge-reranker-v2-minicpm-layerwise`, among others. As examples, we use the basic models `bge-large-en-v1.5` and `bge-reranker-large`. For more details, please refer to the [embedder](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/finetune/embedder) and [reranker](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/finetune/reranker) sections.

 ## 1. Embedder

@ -136,7 +136,7 @@ torchrun --nproc_per_node 2 \

 # 5. Evaluation

-We support evaluations on MTEB, BEIR, MSMARCO, MIRACL, MLDR, MKQA, and AIR-Bench. Here, we provide an example of evaluating MSMARCO passages. For more details, please refer to the [evaluation examples](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation).
+We support evaluations on MTEB, BEIR, MSMARCO, MIRACL, MLDR, MKQA, AIR-Bench, and custom datasets. Below is an example of evaluating MSMARCO passages. For more details, please refer to the [evaluation examples](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation).

 ```shell
 export HF_HUB_CACHE="$HOME/.cache/huggingface/hub"
--- a/examples/evaluation/README.md
+++ b/examples/evaluation/README.md
@ -1,14 +1,18 @@
 # Evaluation

-After finetuning, the model needs to be evaluated. To facilitate this, we have provided scripts for assessing it on various datasets, including **MTEB**, **BEIR**, **MSMARCO**, **MIRACL**, **MLDR**, **MKQA**, and **AIR-Bench**. You can find the specific bash scripts in the respective folders. This document provides an overview of these evaluations.
+After fine-tuning the model, it is essential to evaluate its performance. To facilitate this process, we have provided scripts for assessing the model on various datasets. These datasets include: **MTEB**, **BEIR**, **MSMARCO**, **MIRACL**, **MLDR**, **MKQA**, **AIR-Bench**, and your **custom datasets**.

-First, we will introduce the commonly used parameters, followed by an introduction to the parameters for each dataset.
+To evaluate the model on a specific dataset, you can find the corresponding bash scripts in the respective folders dedicated to each dataset. These scripts contain the necessary commands and configurations to run the evaluation process.
+
+This document serves as an overview of the evaluation process and provides a brief introduction to each dataset.
+
+In this section, we will first introduce the commonly used arguments across all datasets. Then, we will provide a more detailed explanation of the specific arguments used for each individual dataset.

 ## Introduction

 ### 1. EvalArgs

-**Parameters for evaluation setup:**
+**Arguments for evaluation setup:**

 - **`eval_name`**: Name of the evaluation task (e.g., msmarco, beir, miracl).
  
@ -51,7 +55,7 @@ First, we will introduce the commonly used parameters, followed by an introducti

 ### 2. ModelArgs

-**Parameters for Model Configuration:**
+**Arguments for Model Configuration:**

 - **`embedder_name_or_path`**: The name or path to the embedder.
 - **`embedder_model_class`**: Class of the model used for embedding (options include 'auto', 'encoder-only-base', etc.). Default is `auto`.
@ -74,7 +78,7 @@ First, we will introduce the commonly used parameters, followed by an introducti
 - **`reranker_query_max_length`**, **`reranker_max_length`**: Maximum lengths for reranking queries and reranking in general.
 - **`normalize`**: Normalize the reranking scores.
 - **`prompt`**: Prompt for the reranker.
- **`cutoff_layers`**, **`compress_ratio`**, **`compress_layers`**: Parameters for configuring the output and compression of layerwise or lightweight rerankers.
+- **`cutoff_layers`**, **`compress_ratio`**, **`compress_layers`**: arguments for configuring the output and compression of layerwise or lightweight rerankers.

 ***Notice:*** If you evaluate your own model, please set `embedder_model_class` and `reranker_model_class`.

@ -82,7 +86,7 @@ First, we will introduce the commonly used parameters, followed by an introducti

 ### 1. MTEB

-In the evaluation of MTEB, we primarily utilize the official [MTEB](https://github.com/embeddings-benchmark/mteb) code, which supports only the assessment of embedders. Additionally, it restricts the output format of evaluation results to JSON. The following new parameters have been introduced:
+For MTEB, we primarily use the official [MTEB](https://github.com/embeddings-benchmark/mteb) code, which only supports the assessment of embedders. Moreover, it restricts the output format of the evaluation results to JSON. We have introduced the following new arguments:

 - **`languages`**: Languages to evaluate. Default: eng
 - **`tasks`**: Tasks to evaluate. Default: None
@ -110,7 +114,7 @@ python -m FlagEmbedding.evaluation.mteb \

 ### 2. BEIR

-[BEIR](https://github.com/beir-cellar/beir/) supports evaluations on datasets including `arguana`, `climate-fever`, `cqadupstack`, `dbpedia-entity`, `fever`, `fiqa`, `hotpotqa`, `msmarco`, `nfcorpus`, `nq`, `quora`, `scidocs`, `scifact`, `trec-covid`, `webis-touche2020`, with `msmarco` as the dev set and all others as test sets. The following new parameters have been introduced:
+[BEIR](https://github.com/beir-cellar/beir/) supports evaluations on datasets including `arguana`, `climate-fever`, `cqadupstack`, `dbpedia-entity`, `fever`, `fiqa`, `hotpotqa`, `msmarco`, `nfcorpus`, `nq`, `quora`, `scidocs`, `scifact`, `trec-covid`, `webis-touche2020`, with `msmarco` as the dev set and all others as test sets. The following new arguments have been introduced:

 - **`use_special_instructions`**: Whether to use specific instructions in `prompts.py` for evaluation. Default: False

@ -266,7 +270,7 @@ python -m FlagEmbedding.evaluation.mkqa \

 ### 7. AIR-Bench

-The AIR-Bench is mainly based on the official [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench/tree/main) framework, and it necessitates the use of official evaluation metrics. Below are some important parameters:
+The AIR-Bench is primarily based on the official [AIR-Bench repository](https://github.com/AIR-Bench/AIR-Bench/tree/main) and requires the use of its official evaluation codes. Below are some important arguments:

 - **`benchmark_version`**: Benchmark version.
 - **`task_types`**: Task types.