mirror of
https://github.com/FlagOpen/FlagEmbedding.git
synced 2025-12-29 08:02:43 +00:00
evluation local data
This commit is contained in:
parent
34d24e85e0
commit
4a7412d33f
BIN
FlagEmbedding/.DS_Store
vendored
BIN
FlagEmbedding/.DS_Store
vendored
Binary file not shown.
@ -240,7 +240,8 @@ Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C
|
||||
| [text2vec-base](https://huggingface.co/shibing624/text2vec-base-chinese) | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 |
|
||||
| [text2vec-large](https://huggingface.co/GanymedeNil/text2vec-large-chinese) | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 |
|
||||
|
||||
|
||||
- **Your data**
|
||||
If you want to evaluate the model on your data, you can refer to this [command]()
|
||||
|
||||
## Acknowledgement
|
||||
|
||||
|
||||
@ -26,6 +26,15 @@ class Args:
|
||||
default=False,
|
||||
metadata={'help': 'Add query-side instruction?'}
|
||||
)
|
||||
|
||||
corpus_data: str = field(
|
||||
default="namespace-Pt/msmarco",
|
||||
metadata={'help': 'candidate passages'}
|
||||
)
|
||||
query_data: str = field(
|
||||
default="namespace-Pt/msmarco-corpus",
|
||||
metadata={'help': 'queries and their positive passages for evaluation'}
|
||||
)
|
||||
|
||||
max_query_length: int = field(
|
||||
default=32,
|
||||
@ -183,9 +192,14 @@ def evaluate(preds, labels, cutoffs=[1,10,100]):
|
||||
def main():
|
||||
parser = HfArgumentParser([Args])
|
||||
args: Args = parser.parse_args_into_dataclasses()[0]
|
||||
|
||||
eval_data = datasets.load_dataset("namespace-Pt/msmarco", split="dev")
|
||||
corpus = datasets.load_dataset("namespace-Pt/msmarco-corpus", split="train")
|
||||
|
||||
if args.query_data == 'namespace-Pt/msmarco-corpus':
|
||||
assert args.corpus_data == 'namespace-Pt/msmarco'
|
||||
eval_data = datasets.load_dataset("namespace-Pt/msmarco", split="dev")
|
||||
corpus = datasets.load_dataset("namespace-Pt/msmarco-corpus", split="train")
|
||||
else:
|
||||
eval_data = datasets.load_dataset('json', data_files=args.query_data, split='train')
|
||||
corpus = datasets.load_dataset('json', data_files=args.corpus_data, split='train')
|
||||
|
||||
model = FlagModel(
|
||||
args.encoder,
|
||||
|
||||
@ -87,7 +87,7 @@ Noted that the number of negatives should not be larger than the numbers of nega
|
||||
Besides the negatives in this group, the in-batch negatives also will be used in fine-tuning.
|
||||
- `negatives_cross_device`: share the negatives across all GPUs. This argument will extend the number of negatives.
|
||||
- `learning_rate`: select a appropriate for your model. Recommend 1e-5/2e-5/3e-5 for large/base/small-scale.
|
||||
- `temperature`: It will influence the distribution of similarity scores. **Recommend set it 0.01-0.1.**
|
||||
- `temperature`: It will influence the distribution of similarity scores. **Recommended value: 0.01-0.1.**
|
||||
- `query_max_len`: max length for query. Please set it according the average length of queries in your data.
|
||||
- `passage_max_len`: max length for passage. Please set it according the average length of passages in your data.
|
||||
- `query_instruction_for_retrieval`: instruction for query, which will be added to each query. You also can set it `""` to add nothing to query.
|
||||
@ -150,16 +150,24 @@ Please replace the `query_instruction_for_retrieval` with your instruction if yo
|
||||
|
||||
|
||||
### 6. Evaluate model
|
||||
We provide [a simple script](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py) to evaluate the model's performance on MSMARCO, a widely used retrieval benchmark.
|
||||
We provide [a simple script](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py) to evaluate the model's performance.
|
||||
A brief summary of how the script works:
|
||||
1. Load the model on all available GPUs through [DataParallel](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html).
|
||||
2. Encode the corpus and offload the embeddings in `faiss` Flat index. By default, `faiss` also dumps the index on all available GPUs.
|
||||
3. Encode the queries and search `100` nearest neighbors for each query.
|
||||
4. Compute Recall and MRR metrics.
|
||||
|
||||
First, install `faiss`, a popular approximate nearest neighbor search library:
|
||||
```bash
|
||||
conda install -c conda-forge faiss-gpu
|
||||
```
|
||||
|
||||
Next, you can check the data formats for the [msmarco corpus](https://huggingface.co/datasets/namespace-Pt/msmarco-corpus) and [evaluation queries](https://huggingface.co/datasets/namespace-Pt/msmarco).
|
||||
#### 6.1 MSMARCO dataset
|
||||
The default evaluate data is MSMARCO, a widely used retrieval benchmark.
|
||||
|
||||
Finally, run the following command:
|
||||
You can check the data formats for the [msmarco corpus](https://huggingface.co/datasets/namespace-Pt/msmarco-corpus) and [evaluation queries](https://huggingface.co/datasets/namespace-Pt/msmarco).
|
||||
|
||||
Run the following command:
|
||||
|
||||
```bash
|
||||
python -m FlagEmbedding.baai_general_embedding.finetune.eval_msmarco \
|
||||
@ -186,8 +194,33 @@ The results should be similar to
|
||||
}
|
||||
```
|
||||
|
||||
A brief summary of how the script works:
|
||||
1. Load the model on all available GPUs through [DataParallel](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html).
|
||||
2. Encode the corpus and offload the embeddings in `faiss` Flat index. By default, `faiss` also dumps the index on all available GPUs.
|
||||
3. Encode the queries and search `100` nearest neighbors for each query.
|
||||
4. Compute Recall and MRR metrics.
|
||||
#### 6.1 Your dataset
|
||||
|
||||
You should prepare two files with jsonl format:
|
||||
- One is corpus_data, which contains the text you want to search. A toy example: [toy_corpus.json](./toy_evaluation_data/toy_corpus.json)
|
||||
```
|
||||
{"content": "A is ..."}
|
||||
{"content": "B is ..."}
|
||||
{"content": "C is ..."}
|
||||
{"content": "Panda is ..."}
|
||||
{"content": "... is A"}
|
||||
```
|
||||
- The other is query_data, which contains the queries and the ground truth. A toy example: [toy_corpus.json](./toy_evaluation_data/toy_query.json)
|
||||
```
|
||||
{"query": "What is A?", "positive": ["A is ...", "... is A"]}
|
||||
{"query": "What is B?", "positive": ["B is ..."]}
|
||||
{"query": "What is C?", "positive": ["C is ..."]}
|
||||
```
|
||||
|
||||
Then, pass the data path to evaluation script:
|
||||
```bash
|
||||
python -m FlagEmbedding.baai_general_embedding.finetune.eval_msmarco \
|
||||
--encoder BAAI/bge-base-en-v1.5 \
|
||||
--fp16 \
|
||||
--add_instruction \
|
||||
--k 100 \
|
||||
--corpus_data ./toy_evaluation_data/toy_corpus.json \
|
||||
--query_data ./toy_evaluation_data/toy_query.json
|
||||
```
|
||||
|
||||
|
||||
|
||||
5
examples/finetune/toy_evaluation_data/toy_corpus.json
Normal file
5
examples/finetune/toy_evaluation_data/toy_corpus.json
Normal file
@ -0,0 +1,5 @@
|
||||
{"content": "A is ..."}
|
||||
{"content": "B is ..."}
|
||||
{"content": "C is ..."}
|
||||
{"content": "Panda is ..."}
|
||||
{"content": "... is A"}
|
||||
3
examples/finetune/toy_evaluation_data/toy_query.json
Normal file
3
examples/finetune/toy_evaluation_data/toy_query.json
Normal file
@ -0,0 +1,3 @@
|
||||
{"query": "What is A?", "positive": ["A is ...", "... is A"]}
|
||||
{"query": "What is B?", "positive": ["B is ..."]}
|
||||
{"query": "What is C?", "positive": ["C is ..."]}
|
||||
Loading…
x
Reference in New Issue
Block a user