FlagEmbedding/docs/source/tutorial/5_Reranking/5.3.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluate Reranker"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reranker usually better captures the latent semantic meanings between sentences. But comparing to using an embedding model, it will take quadratic $O(N^2)$ running time for the whole dataset. Thus the most common use cases of rerankers in information retrieval or RAG is reranking the top k answers retrieved according to the embedding similarities.\n",
    "\n",
    "The evaluation of reranker has the similar idea. We compare how much better the rerankers can rerank the candidates searched by a same embedder. In this tutorial, we will evaluate two rerankers' performances on BEIR benchmark, with bge-large-en-v1.5 as the base embedding model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: We highly recommend to run this notebook with GPU. The whole pipeline is very time consuming. For simplicity, we only use a single task FiQA in BEIR."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Installation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First install the required dependency"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install FlagEmbedding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. bge-reranker-large"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first model is bge-reranker-large, a BERT like reranker with about 560M parameters.\n",
    "\n",
    "We can use the evaluation pipeline of FlagEmbedding to directly run the whole process:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Split 'dev' not found in the dataset. Removing it from the list.\n",
      "ignore_identical_ids is set to True. This means that the search results will not contain identical ids. Note: Dataset such as MIRACL should NOT set this to True.\n",
      "pre tokenize: 100%|██████████| 57/57 [00:03<00:00, 14.68it/s]\n",
      "You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",
      "  warnings.warn(\n",
      "Inference Embeddings: 100%|██████████| 57/57 [00:44<00:00,  1.28it/s]\n",
      "pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 61.59it/s]\n",
      "Inference Embeddings: 100%|██████████| 1/1 [00:00<00:00,  6.22it/s]\n",
      "Searching: 100%|██████████| 21/21 [00:00<00:00, 68.26it/s]\n",
      "pre tokenize:   0%|          | 0/64 [00:00<?, ?it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "pre tokenize: 100%|██████████| 64/64 [00:08<00:00,  7.15it/s]\n",
      "Compute Scores: 100%|██████████| 64/64 [01:39<00:00,  1.56s/it]\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "python -m FlagEmbedding.evaluation.beir \\\n",
    "--eval_name beir \\\n",
    "--dataset_dir ./beir/data \\\n",
    "--dataset_names fiqa \\\n",
    "--splits test dev \\\n",
    "--corpus_embd_save_dir ./beir/corpus_embd \\\n",
    "--output_dir ./beir/search_results \\\n",
    "--search_top_k 1000 \\\n",
    "--rerank_top_k 100 \\\n",
    "--cache_path /root/.cache/huggingface/hub \\\n",
    "--overwrite True \\\n",
    "--k_values 10 100 \\\n",
    "--eval_output_method markdown \\\n",
    "--eval_output_path ./beir/beir_eval_results.md \\\n",
    "--eval_metrics ndcg_at_10 recall_at_100 \\\n",
    "--ignore_identical_ids True \\\n",
    "--embedder_name_or_path BAAI/bge-large-en-v1.5 \\\n",
    "--reranker_name_or_path BAAI/bge-reranker-large \\\n",
    "--embedder_batch_size 1024 \\\n",
    "--reranker_batch_size 1024 \\\n",
    "--devices cuda:0 \\"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. bge-reranker-v2-gemma"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second model is bge-reranker-v2-m3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Split 'dev' not found in the dataset. Removing it from the list.\n",
      "ignore_identical_ids is set to True. This means that the search results will not contain identical ids. Note: Dataset such as MIRACL should NOT set this to True.\n",
      "initial target device: 100%|██████████| 4/4 [01:14<00:00, 18.51s/it]\n",
      "pre tokenize: 100%|██████████| 15/15 [00:01<00:00, 11.21it/s]\n",
      "You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "pre tokenize: 100%|██████████| 15/15 [00:01<00:00, 11.32it/s]\n",
      "You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "pre tokenize: 100%|██████████| 15/15 [00:01<00:00, 10.29it/s]\n",
      "You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "pre tokenize: 100%|██████████| 15/15 [00:01<00:00, 13.99it/s]\n",
      "You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",
      "  warnings.warn(\n",
      "/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",
      "  warnings.warn(\n",
      "/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",
      "  warnings.warn(\n",
      "/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",
      "  warnings.warn(\n",
      "Inference Embeddings: 100%|██████████| 15/15 [00:12<00:00,  1.24it/s]\n",
      "Inference Embeddings: 100%|██████████| 15/15 [00:12<00:00,  1.23it/s]\n",
      "Inference Embeddings: 100%|██████████| 15/15 [00:12<00:00,  1.22it/s]\n",
      "Inference Embeddings: 100%|██████████| 15/15 [00:12<00:00,  1.21it/s]\n",
      "Chunks: 100%|██████████| 4/4 [00:30<00:00,  7.70s/it]\n",
      "Chunks: 100%|██████████| 4/4 [00:00<00:00, 47.90it/s]\n",
      "Searching: 100%|██████████| 21/21 [00:00<00:00, 128.34it/s]\n",
      "initial target device: 100%|██████████| 4/4 [01:09<00:00, 17.43s/it]\n",
      "pre tokenize:   0%|          | 0/16 [00:00<?, ?it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "pre tokenize:  12%|█▎        | 2/16 [00:00<00:02,  6.46it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "pre tokenize:  12%|█▎        | 2/16 [00:00<00:03,  4.60it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "pre tokenize:  25%|██▌       | 4/16 [00:00<00:02,  4.61it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
      "pre tokenize: 100%|██████████| 16/16 [00:03<00:00,  4.12it/s]\n",
      "pre tokenize: 100%|██████████| 16/16 [00:04<00:00,  3.78it/s]\n",
      "pre tokenize: 100%|██████████| 16/16 [00:04<00:00,  3.95it/s]\n",
      "pre tokenize: 100%|██████████| 16/16 [00:04<00:00,  3.81it/s]\n",
      "Compute Scores: 100%|██████████| 67/67 [00:29<00:00,  2.30it/s]\n",
      "Compute Scores: 100%|██████████| 67/67 [00:29<00:00,  2.27it/s]\n",
      "Compute Scores: 100%|██████████| 67/67 [00:29<00:00,  2.27it/s]\n",
      "Compute Scores: 100%|██████████| 67/67 [00:30<00:00,  2.19it/s]\n",
      "Chunks: 100%|██████████| 4/4 [00:51<00:00, 12.97s/it]\n",
      "/share/project/xzy/Envs/ft/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 8 leaked semaphore objects to clean up at shutdown\n",
      "  warnings.warn('resource_tracker: There appear to be %d '\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "python -m FlagEmbedding.evaluation.beir \\\n",
    "--eval_name beir \\\n",
    "--dataset_dir ./beir/data \\\n",
    "--dataset_names fiqa \\\n",
    "--splits test dev \\\n",
    "--corpus_embd_save_dir ./beir/corpus_embd \\\n",
    "--output_dir ./beir/search_results \\\n",
    "--search_top_k 1000 \\\n",
    "--rerank_top_k 100 \\\n",
    "--cache_path /root/.cache/huggingface/hub \\\n",
    "--overwrite True \\\n",
    "--k_values 10 100 \\\n",
    "--eval_output_method markdown \\\n",
    "--eval_output_path ./beir/beir_eval_results.md \\\n",
    "--eval_metrics ndcg_at_10 recall_at_100 \\\n",
    "--ignore_identical_ids True \\\n",
    "--embedder_name_or_path BAAI/bge-large-en-v1.5 \\\n",
    "--reranker_name_or_path BAAI/bge-reranker-v2-m3 \\\n",
    "--embedder_batch_size 1024 \\\n",
    "--reranker_batch_size 1024 \\\n",
    "--devices cuda:0 cuda:1 cuda:2 cuda:3 \\\n",
    "--reranker_max_length 1024 \\"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'fiqa-test': {'ndcg_at_10': 0.40991, 'ndcg_at_100': 0.48028, 'map_at_10': 0.32127, 'map_at_100': 0.34227, 'recall_at_10': 0.50963, 'recall_at_100': 0.75987, 'precision_at_10': 0.11821, 'precision_at_100': 0.01932, 'mrr_at_10': 0.47786, 'mrr_at_100': 0.4856}}\n",
      "{'fiqa-test': {'ndcg_at_10': 0.44828, 'ndcg_at_100': 0.51525, 'map_at_10': 0.36551, 'map_at_100': 0.38578, 'recall_at_10': 0.519, 'recall_at_100': 0.75987, 'precision_at_10': 0.12299, 'precision_at_100': 0.01932, 'mrr_at_10': 0.53382, 'mrr_at_100': 0.54108}}\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "\n",
    "with open('beir/search_results/bge-large-en-v1.5/bge-reranker-large/EVAL/eval_results.json') as f:\n",
    "    results_1 = json.load(f)\n",
    "    print(results_1)\n",
    "    \n",
    "with open('beir/search_results/bge-large-en-v1.5/bge-reranker-v2-m3/EVAL/eval_results.json') as f:\n",
    "    results_2 = json.load(f)\n",
    "    print(results_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the above results we can see that bge-reranker-v2-m3 has advantage on almost all the metrics."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ft",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
update docs 2025-01-16 08:52:31 +00:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Evaluate Reranker"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Reranker usually better captures the latent semantic meanings between sentences. But comparing to using an embedding model, it will take quadratic $O(N^2)$ running time for the whole dataset. Thus the most common use cases of rerankers in information retrieval or RAG is reranking the top k answers retrieved according to the embedding similarities.\n",`
			`"\n",`
			`"The evaluation of reranker has the similar idea. We compare how much better the rerankers can rerank the candidates searched by a same embedder. In this tutorial, we will evaluate two rerankers' performances on BEIR benchmark, with bge-large-en-v1.5 as the base embedding model."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Note: We highly recommend to run this notebook with GPU. The whole pipeline is very time consuming. For simplicity, we only use a single task FiQA in BEIR."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## 0. Installation"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"First install the required dependency"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"%pip install FlagEmbedding"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## 1. bge-reranker-large"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"The first model is bge-reranker-large, a BERT like reranker with about 560M parameters.\n",`
			`"\n",`
			`"We can use the evaluation pipeline of FlagEmbedding to directly run the whole process:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 3,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"name": "stderr",`
			`"output_type": "stream",`
			`"text": [`
			`"Split 'dev' not found in the dataset. Removing it from the list.\n",`
			`"ignore_identical_ids is set to True. This means that the search results will not contain identical ids. Note: Dataset such as MIRACL should NOT set this to True.\n",`
			`"pre tokenize: 100%\|██████████\| 57/57 [00:03<00:00, 14.68it/s]\n",`
			"You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			`"/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",`
			`" warnings.warn(\n",`
			`"Inference Embeddings: 100%\|██████████\| 57/57 [00:44<00:00, 1.28it/s]\n",`
			`"pre tokenize: 100%\|██████████\| 1/1 [00:00<00:00, 61.59it/s]\n",`
			`"Inference Embeddings: 100%\|██████████\| 1/1 [00:00<00:00, 6.22it/s]\n",`
			`"Searching: 100%\|██████████\| 21/21 [00:00<00:00, 68.26it/s]\n",`
			"pre tokenize: 0%\| \| 0/64 [00:00<?, ?it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			`"pre tokenize: 100%\|██████████\| 64/64 [00:08<00:00, 7.15it/s]\n",`
			`"Compute Scores: 100%\|██████████\| 64/64 [01:39<00:00, 1.56s/it]\n"`
			`]`
			`}`
			`],`
			`"source": [`
			`"%%bash\n",`
			`"python -m FlagEmbedding.evaluation.beir \\\n",`
			`"--eval_name beir \\\n",`
			`"--dataset_dir ./beir/data \\\n",`
			`"--dataset_names fiqa \\\n",`
			`"--splits test dev \\\n",`
			`"--corpus_embd_save_dir ./beir/corpus_embd \\\n",`
			`"--output_dir ./beir/search_results \\\n",`
			`"--search_top_k 1000 \\\n",`
			`"--rerank_top_k 100 \\\n",`
			`"--cache_path /root/.cache/huggingface/hub \\\n",`
			`"--overwrite True \\\n",`
			`"--k_values 10 100 \\\n",`
			`"--eval_output_method markdown \\\n",`
			`"--eval_output_path ./beir/beir_eval_results.md \\\n",`
			`"--eval_metrics ndcg_at_10 recall_at_100 \\\n",`
			`"--ignore_identical_ids True \\\n",`
			`"--embedder_name_or_path BAAI/bge-large-en-v1.5 \\\n",`
			`"--reranker_name_or_path BAAI/bge-reranker-large \\\n",`
			`"--embedder_batch_size 1024 \\\n",`
			`"--reranker_batch_size 1024 \\\n",`
			`"--devices cuda:0 \\"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## 2. bge-reranker-v2-gemma"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"The second model is bge-reranker-v2-m3"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 3,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"name": "stderr",`
			`"output_type": "stream",`
			`"text": [`
			`"Split 'dev' not found in the dataset. Removing it from the list.\n",`
			`"ignore_identical_ids is set to True. This means that the search results will not contain identical ids. Note: Dataset such as MIRACL should NOT set this to True.\n",`
			`"initial target device: 100%\|██████████\| 4/4 [01:14<00:00, 18.51s/it]\n",`
			`"pre tokenize: 100%\|██████████\| 15/15 [00:01<00:00, 11.21it/s]\n",`
			"You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			`"pre tokenize: 100%\|██████████\| 15/15 [00:01<00:00, 11.32it/s]\n",`
			"You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			`"pre tokenize: 100%\|██████████\| 15/15 [00:01<00:00, 10.29it/s]\n",`
			"You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			`"pre tokenize: 100%\|██████████\| 15/15 [00:01<00:00, 13.99it/s]\n",`
			"You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			`"/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",`
			`" warnings.warn(\n",`
			`"/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",`
			`" warnings.warn(\n",`
			`"/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",`
			`" warnings.warn(\n",`
			`"/share/project/xzy/Envs/ft/lib/python3.11/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml\n",`
			`" warnings.warn(\n",`
			`"Inference Embeddings: 100%\|██████████\| 15/15 [00:12<00:00, 1.24it/s]\n",`
			`"Inference Embeddings: 100%\|██████████\| 15/15 [00:12<00:00, 1.23it/s]\n",`
			`"Inference Embeddings: 100%\|██████████\| 15/15 [00:12<00:00, 1.22it/s]\n",`
			`"Inference Embeddings: 100%\|██████████\| 15/15 [00:12<00:00, 1.21it/s]\n",`
			`"Chunks: 100%\|██████████\| 4/4 [00:30<00:00, 7.70s/it]\n",`
			`"Chunks: 100%\|██████████\| 4/4 [00:00<00:00, 47.90it/s]\n",`
			`"Searching: 100%\|██████████\| 21/21 [00:00<00:00, 128.34it/s]\n",`
			`"initial target device: 100%\|██████████\| 4/4 [01:09<00:00, 17.43s/it]\n",`
			"pre tokenize: 0%\| \| 0/16 [00:00<?, ?it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			"pre tokenize: 12%\|█▎ \| 2/16 [00:00<00:02, 6.46it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			"pre tokenize: 12%\|█▎ \| 2/16 [00:00<00:03, 4.60it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			"pre tokenize: 25%\|██▌ \| 4/16 [00:00<00:02, 4.61it/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
			`"pre tokenize: 100%\|██████████\| 16/16 [00:03<00:00, 4.12it/s]\n",`
			`"pre tokenize: 100%\|██████████\| 16/16 [00:04<00:00, 3.78it/s]\n",`
			`"pre tokenize: 100%\|██████████\| 16/16 [00:04<00:00, 3.95it/s]\n",`
			`"pre tokenize: 100%\|██████████\| 16/16 [00:04<00:00, 3.81it/s]\n",`
			`"Compute Scores: 100%\|██████████\| 67/67 [00:29<00:00, 2.30it/s]\n",`
			`"Compute Scores: 100%\|██████████\| 67/67 [00:29<00:00, 2.27it/s]\n",`
			`"Compute Scores: 100%\|██████████\| 67/67 [00:29<00:00, 2.27it/s]\n",`
			`"Compute Scores: 100%\|██████████\| 67/67 [00:30<00:00, 2.19it/s]\n",`
			`"Chunks: 100%\|██████████\| 4/4 [00:51<00:00, 12.97s/it]\n",`
			`"/share/project/xzy/Envs/ft/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 8 leaked semaphore objects to clean up at shutdown\n",`
			`" warnings.warn('resource_tracker: There appear to be %d '\n"`
			`]`
			`}`
			`],`
			`"source": [`
			`"%%bash\n",`
			`"python -m FlagEmbedding.evaluation.beir \\\n",`
			`"--eval_name beir \\\n",`
			`"--dataset_dir ./beir/data \\\n",`
			`"--dataset_names fiqa \\\n",`
			`"--splits test dev \\\n",`
			`"--corpus_embd_save_dir ./beir/corpus_embd \\\n",`
			`"--output_dir ./beir/search_results \\\n",`
			`"--search_top_k 1000 \\\n",`
			`"--rerank_top_k 100 \\\n",`
			`"--cache_path /root/.cache/huggingface/hub \\\n",`
			`"--overwrite True \\\n",`
			`"--k_values 10 100 \\\n",`
			`"--eval_output_method markdown \\\n",`
			`"--eval_output_path ./beir/beir_eval_results.md \\\n",`
			`"--eval_metrics ndcg_at_10 recall_at_100 \\\n",`
			`"--ignore_identical_ids True \\\n",`
			`"--embedder_name_or_path BAAI/bge-large-en-v1.5 \\\n",`
			`"--reranker_name_or_path BAAI/bge-reranker-v2-m3 \\\n",`
			`"--embedder_batch_size 1024 \\\n",`
			`"--reranker_batch_size 1024 \\\n",`
			`"--devices cuda:0 cuda:1 cuda:2 cuda:3 \\\n",`
			`"--reranker_max_length 1024 \\"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## 3. Comparison"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 4,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"name": "stdout",`
			`"output_type": "stream",`
			`"text": [`
			`"{'fiqa-test': {'ndcg_at_10': 0.40991, 'ndcg_at_100': 0.48028, 'map_at_10': 0.32127, 'map_at_100': 0.34227, 'recall_at_10': 0.50963, 'recall_at_100': 0.75987, 'precision_at_10': 0.11821, 'precision_at_100': 0.01932, 'mrr_at_10': 0.47786, 'mrr_at_100': 0.4856}}\n",`
			`"{'fiqa-test': {'ndcg_at_10': 0.44828, 'ndcg_at_100': 0.51525, 'map_at_10': 0.36551, 'map_at_100': 0.38578, 'recall_at_10': 0.519, 'recall_at_100': 0.75987, 'precision_at_10': 0.12299, 'precision_at_100': 0.01932, 'mrr_at_10': 0.53382, 'mrr_at_100': 0.54108}}\n"`
			`]`
			`}`
			`],`
			`"source": [`
			`"import json\n",`
			`"\n",`
			`"with open('beir/search_results/bge-large-en-v1.5/bge-reranker-large/EVAL/eval_results.json') as f:\n",`
			`" results_1 = json.load(f)\n",`
			`" print(results_1)\n",`
			`" \n",`
			`"with open('beir/search_results/bge-large-en-v1.5/bge-reranker-v2-m3/EVAL/eval_results.json') as f:\n",`
			`" results_2 = json.load(f)\n",`
			`" print(results_2)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"From the above results we can see that bge-reranker-v2-m3 has advantage on almost all the metrics."`
			`]`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "ft",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.11.10"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 2`
			`}`