mirror of
https://github.com/FlagOpen/FlagEmbedding.git
synced 2025-12-28 15:42:57 +00:00
47 lines
1.4 KiB
ReStructuredText
47 lines
1.4 KiB
ReStructuredText
BEIR
|
|
====
|
|
|
|
`BEIR <https://github.com/beir-cellar/beir>`_ (Benchmarking-IR) is a heterogeneous evaluation benchmark for information retrieval.
|
|
It is designed for evaluating the performance of NLP-based retrieval models and widely used by research of modern embedding models.
|
|
|
|
You can evaluate model's performance on the BEIR benchmark by running our provided shell script:
|
|
|
|
.. code:: bash
|
|
|
|
chmod +x /examples/evaluation/beir/eval_beir.sh
|
|
./examples/evaluation/beir/eval_beir.sh
|
|
|
|
Or by running:
|
|
|
|
.. code:: bash
|
|
|
|
python -m FlagEmbedding.evaluation.beir \
|
|
--eval_name beir \
|
|
--dataset_dir ./beir/data \
|
|
--dataset_names fiqa arguana cqadupstack \
|
|
--splits test dev \
|
|
--corpus_embd_save_dir ./beir/corpus_embd \
|
|
--output_dir ./beir/search_results \
|
|
--search_top_k 1000 \
|
|
--rerank_top_k 100 \
|
|
--cache_path /root/.cache/huggingface/hub \
|
|
--overwrite False \
|
|
--k_values 10 100 \
|
|
--eval_output_method markdown \
|
|
--eval_output_path ./beir/beir_eval_results.md \
|
|
--eval_metrics ndcg_at_10 recall_at_100 \
|
|
--ignore_identical_ids True \
|
|
--embedder_name_or_path BAAI/bge-large-en-v1.5 \
|
|
--reranker_name_or_path BAAI/bge-reranker-v2-m3 \
|
|
--devices cuda:0 cuda:1 \
|
|
--reranker_max_length 1024 \
|
|
|
|
change the embedder, devices and cache directory to your preference.
|
|
|
|
.. toctree::
|
|
:hidden:
|
|
|
|
beir/arguments
|
|
beir/data_loader
|
|
beir/evaluator
|
|
beir/runner |