FlagEmbedding/docs/source/tutorial/4_Evaluation/4.2.3.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# C-MTEB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "C-MTEB is the largest benchmark for Chinese text embeddings, similar to MTEB. In this tutorial, we will go through how to evaluate an embedding model's ability on Chinese tasks in C-MTEB."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Installation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First install dependent packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install FlagEmbedding mteb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "C-MTEB uses similar task splits and metrics as English MTEB. It contains 35 datasets in 6 different tasks: Classification, Clustering, Pair Classification, Reranking, Retrieval, and Semantic Textual Similarity (STS). \n",
    "\n",
    "1. **Classification**: Use the embeddings to train a logistic regression on the train set and is scored on the test set. F1 is the main metric.\n",
    "2. **Clustering**: Train a mini-batch k-means model with batch size 32 and k equals to the number of different labels. Then score using v-measure.\n",
    "3. **Pair Classification**: A pair of text inputs is provided and a label which is a binary variable needs to be assigned. The main metric is average precision score.\n",
    "4. **Reranking**: Rank a list of relevant and irrelevant reference texts according to a query. Metrics are mean MRR@k and MAP.\n",
    "5. **Retrieval**: Each dataset comprises corpus, queries, and a mapping that links each query to its relevant documents within the corpus. The goal is to retrieve relevant documents for each query. The main metric is nDCG@k. MTEB directly adopts BEIR for the retrieval task.\n",
    "6. **Semantic Textual Similarity (STS)**: Determine the similarity between each sentence pair. Spearman correlation based on cosine\n",
    "similarity serves as the main metric.\n",
    "\n",
    "\n",
    "Check the [HF page](https://huggingface.co/C-MTEB) for the details of each dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ChineseTaskList = [\n",
    "    'TNews', 'IFlyTek', 'MultilingualSentiment', 'JDReview', 'OnlineShopping', 'Waimai',\n",
    "    'CLSClusteringS2S.v2', 'CLSClusteringP2P.v2', 'ThuNewsClusteringS2S.v2', 'ThuNewsClusteringP2P.v2',\n",
    "    'Ocnli', 'Cmnli',\n",
    "    'T2Reranking', 'MMarcoReranking', 'CMedQAv1-reranking', 'CMedQAv2-reranking',\n",
    "    'T2Retrieval', 'MMarcoRetrieval', 'DuRetrieval', 'CovidRetrieval', 'CmedqaRetrieval', 'EcomRetrieval', 'MedicalRetrieval', 'VideoRetrieval',\n",
    "    'ATEC', 'BQ', 'LCQMC', 'PAWSX', 'STSB', 'AFQMC', 'QBQTC'\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, load the model for evaluation. Note that the instruction here is used for retreival tasks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ...C_MTEB.flag_dres_model import FlagDRESModel\n",
    "\n",
    "instruction = \"为这个句子生成表示以用于检索相关文章：\"\n",
    "model_name = \"BAAI/bge-base-zh-v1.5\"\n",
    "\n",
    "model = FlagDRESModel(model_name_or_path=\"BAAI/bge-base-zh-v1.5\",\n",
    "                      query_instruction_for_retrieval=instruction,\n",
    "                      pooling_method=\"cls\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Otherwise, you can load a model using sentence_transformers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sentence_transformers import SentenceTransformer\n",
    "\n",
    "model = SentenceTransformer(\"PATH_TO_MODEL\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or implement a class following the structure below:\n",
    "\n",
    "```python\n",
    "class MyModel():\n",
    "    def __init__(self):\n",
    "        \"\"\"initialize the tokenizer and model\"\"\"\n",
    "        pass\n",
    "\n",
    "    def encode(self, sentences, batch_size=32, **kwargs):\n",
    "        \"\"\" Returns a list of embeddings for the given sentences.\n",
    "        Args:\n",
    "            sentences (`List[str]`): List of sentences to encode\n",
    "            batch_size (`int`): Batch size for the encoding\n",
    "\n",
    "        Returns:\n",
    "            `List[np.ndarray]` or `List[tensor]`: List of embeddings for the given sentences\n",
    "        \"\"\"\n",
    "        pass\n",
    "\n",
    "model = MyModel()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Evaluate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After we've prepared the dataset and model, we can start the evaluation. For time efficiency, we highly recommend to use GPU for evaluation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mteb\n",
    "from mteb import MTEB\n",
    "\n",
    "tasks = mteb.get_tasks(ChineseTaskList)\n",
    "\n",
    "for task in tasks:\n",
    "    evaluation = MTEB(tasks=[task])\n",
    "    evaluation.run(model, output_folder=f\"zh_results/{model_name.split('/')[-1]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Submit to MTEB Leaderboard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the evaluation is done, all the evaluation results should be stored in `zh_results/{model_name}/`.\n",
    "\n",
    "Then run the following shell command to create the model_card.md. Change {model_name} and its following to your path."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!!mteb create_meta --results_folder results/{model_name}/ --output_path model_card.md"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copy and paste the contents of model_card.md to the top of README.md of your model on HF Hub. Then goto the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and choose the Chinese leaderboard to find your model! It will appear soon after the website's daily refresh."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}