Reranker
Usage
Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
Using FlagEmbedding
pip install -U FlagEmbedding
Get relevance scores (higher scores indicate more relevance):
from FlagEmbedding import FlagReranker
reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
score = reranker.compute_score(['query', 'passage'])
print(score)
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores)
Using Huggingface transformers
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')
model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')
model.eval()
pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
print(scores)
Fine-tune
You can follow this example to fine-tune the reranker.
This reranker is initialized from xlm-roberta-base, and we train it on a mixture of multilingual datasets:
- Chinese: 788,491 text pairs from T2ranking, MMmarco, dulreader, Cmedqa-v2, and nli-zh
- English: 933,090 text pairs from msmarco, nq, hotpotqa, and NLI
- Others: 97,458 text pairs from Mr.TyDi (including arabic, bengali, english, finnish, indonesian, japanese, korean, russian, swahili, telugu, thai)
In order to enhance the cross-language retrieval ability, we construct two cross-language retrieval datasets bases on MMarco. Specifically, we sample 100,000 english queries to retrieve the chinese passages, and also sample 100,000 chinese queries to retrieve english passages. The dataset has been released at Shitao/bge-reranker-data.
Currently, this model mainly supports Chinese and English, and may see performance degradation for other low-resource languages.
Evaluation
You can evaluate the reranker using our c-mteb script
| Model | T2Reranking | T2RerankingZh2En* | T2RerankingEn2Zh* | MmarcoReranking | CMedQAv1 | CMedQAv2 | Avg |
|---|---|---|---|---|---|---|---|
| text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 |
| multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 |
| multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 |
| multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 |
| m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 |
| m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 |
| bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 |
| bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 |
| bge-reranker-base | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 |
| bge-reranker-large | 67.60 | 64.04 | 61.45 | 37.17 | 82.14 | 84.19 | 66.10 |
* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval task
Acknowledgement
Part of the code is developed based on Reranker.
Citation
If you find this repository useful, please consider giving a star ⭐ and citation
@misc{bge_embedding,
title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
year={2023},
eprint={2309.07597},
archivePrefix={arXiv},
primaryClass={cs.CL}
}