"Now, we have the embeddings of the query and the corpus. The next step is to calculate the similarity between the query and each sentence in the corpus. Here we use the dot product/inner product as our similarity metric."
"Now from the ranking, the sentence with index 3 is the best answer to our query \"Who could be an expert of neural network?\"\n",
"\n",
"And that person is Geoffrey Hinton!"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Geoffrey Hinton, as a foundational figure in AI, received Turing Award for his contribution in deep learning.\n"
]
}
],
"source": [
"print(corpus[3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"According to the order of indecies, we can print out the ranking of people that our little retriever got."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Score of 0.608: \"Geoffrey Hinton, as a foundational figure in AI, received Turing Award for his contribution in deep learning.\"\n",
"Score of 0.603: \"Fei-Fei Li is a professor in Stanford University, revolutionized computer vision with the ImageNet project.\"\n",
"Score of 0.528: \"Andrew Ng spread AI knowledge globally via public courses on Coursera and Stanford University.\"\n",
"Score of 0.463: \"Sam Altman leads OpenAI as its CEO, with astonishing works of GPT series and pursuing safe and beneficial AI.\"\n",
"Score of 0.402: \"Morgan Freeman is an acclaimed actor famous for his distinctive voice and diverse roles.\"\n",
"Score of 0.394: \"Eminem is a renowned rapper and one of the best-selling music artists of all time.\"\n",
"Score of 0.393: \"Michael Jackson was a legendary pop icon known for his record-breaking music and dance innovations.\"\n",
"Score of 0.368: \"Robert Downey Jr. is an iconic actor best known for playing Iron Man in the Marvel Cinematic Universe.\"\n",
"Score of 0.354: \"Taylor Swift is a Grammy-winning singer-songwriter known for her narrative-driven music.\"\n",
"Score of 0.327: \"Brad Pitt is a versatile actor and producer known for his roles in films like 'Fight Club' and 'Once Upon a Time in Hollywood.'\"\n"
]
}
],
"source": [
"# iteratively print the score and corresponding sentences in descending order\n",
"\n",
"for i in sorted_indices:\n",
" print(f\"Score of {sim_scores[i]:.3f}: \\\"{corpus[i]}\\\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the ranking, not surprisingly, the similarity scores of the query and the discriptions of Geoffrey Hinton and Fei-Fei Li is way higher than others, following by those of Andrew Ng and Sam Altman. \n",
"\n",
"While the key phrase \"neural network\" in the query does not appear in any of those discriptions, the BGE embedding model is still powerful enough to get the semantic meaning of query and corpus well."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Evaluate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We've seen the embedding model performed pretty well on the \"neural network\" query. What about the more general quality?\n",
"\n",
"Let's generate a very small dataset of queries and corresponding ground truth answers. Note that the ground truth answers are the indices of sentences in the corpus."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"queries = [\n",
" \"Who could be an expert of neural network?\",\n",
" \"Who might had won Grammy?\",\n",
" \"Won Academy Awards\",\n",
" \"One of the most famous female singers.\",\n",
" \"Inventor of AlexNet\",\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"ground_truth = [\n",
" [1, 3],\n",
" [0, 4, 5],\n",
" [2, 7, 9],\n",
" [5],\n",
" [3],\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we repeat the steps we covered above to get the predicted ranking of each query."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[[3, 1, 8, 6, 7, 4, 0, 9, 5, 2],\n",
" [5, 0, 3, 4, 1, 9, 7, 2, 6, 8],\n",
" [3, 2, 7, 5, 9, 0, 1, 4, 6, 8],\n",
" [5, 0, 4, 7, 1, 9, 2, 3, 6, 8],\n",
" [3, 1, 8, 6, 0, 7, 5, 9, 4, 2]]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# use bge model to generate embeddings for all the queries\n",
"rankings = [sorted(range(len(sim_scores)), key=lambda k: sim_scores[k], reverse=True) for sim_scores in scores]\n",
"rankings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mean Reciprocal Rank ([MRR](https://en.wikipedia.org/wiki/Mean_reciprocal_rank)) is a widely used metric in information retrieval to evaluate the effectiveness of a system. Here we use that to have a very rough idea how our system performs."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"def MRR(preds, labels, cutoffs):\n",
" mrr = [0 for _ in range(len(cutoffs))]\n",
" for pred, label in zip(preds, labels):\n",
" for i, c in enumerate(cutoffs):\n",
" for j, index in enumerate(pred):\n",
" if j < c and index in label:\n",
" mrr[i] += 1/(j+1)\n",
" break\n",
" mrr = [k/len(preds) for k in mrr]\n",
" return mrr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We choose to use 1 and 5 as our cutoffs, with the result of 0.8 and 0.9 respectively."