607 lines
369 KiB
Plaintext
Raw Normal View History

2025-01-16 08:52:31 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluate on MLDR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[MLDR](https://huggingface.co/datasets/Shitao/MLDR) is a Multilingual Long-Document Retrieval dataset built on Wikipeida, Wudao and mC4, covering 13 typologically diverse languages. Specifically, we sample lengthy articles from Wikipedia, Wudao and mC4 datasets and randomly choose paragraphs from them. Then we use GPT-3.5 to generate questions based on these paragraphs. The generated question and the sampled article constitute a new text pair to the dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Installation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First install the libraries we are using:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"% pip install FlagEmbedding pytrec_eval"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download the dataset of 13 different languages from [Hugging Face](https://huggingface.co/datasets/Shitao/MLDR)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| Language Code | Language | Source | #train | #dev | #test | #corpus | Avg. Length of Docs |\n",
"| :-----------: | :--------: | :--------------: | :-----: | :---: | :---: | :-----: | :-----------------: |\n",
"| ar | Arabic | Wikipedia | 1,817 | 200 | 200 | 7,607 | 9,428 |\n",
"| de | German | Wikipedia, mC4 | 1,847 | 200 | 200 | 10,000 | 9,039 |\n",
"| en | English | Wikipedia | 10,000 | 200 | 800 | 200,000 | 3,308 |\n",
"| es | Spanish | Wikipedia, mc4 | 2,254 | 200 | 200 | 9,551 | 8,771 |\n",
"| fr | French | Wikipedia | 1,608 | 200 | 200 | 10,000 | 9,659 |\n",
"| hi | Hindi | Wikipedia | 1,618 | 200 | 200 | 3,806 | 5,555 |\n",
"| it | Italian | Wikipedia | 2,151 | 200 | 200 | 10,000 | 9,195 |\n",
"| ja | Japanese | Wikipedia | 2,262 | 200 | 200 | 10,000 | 9,297 |\n",
"| ko | Korean | Wikipedia | 2,198 | 200 | 200 | 6,176 | 7,832 |\n",
"| pt | Portuguese | Wikipedia | 1,845 | 200 | 200 | 6,569 | 7,922 |\n",
"| ru | Russian | Wikipedia | 1,864 | 200 | 200 | 10,000 | 9,723 |\n",
"| th | Thai | mC4 | 1,970 | 200 | 200 | 10,000 | 8,089 |\n",
"| zh | Chinese | Wikipedia, Wudao | 10,000 | 200 | 800 | 200,000 | 4,249 |\n",
"| Total | - | - | 41,434 | 2,600 | 3,800 | 493,709 | 4,737 |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First download the queries and corresponding qrels:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
"lang = \"en\"\n",
"dataset = load_dataset('Shitao/MLDR', lang, trust_remote_code=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each item has four parts: `query_id`, `query`, `positive_passages`, and `negative_passages`. `query_id` and `query` correspond to the id and text content of the qeury. `positive_passages` and `negative_passages` are list of passages with their corresponding `docid` and `text`. "
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query_id': 'q-en-1',\n",
" 'query': 'What is the syntax for the shorthand of the conditional operator in PHP 5.3?',\n",
" 'positive_passages': [{'docid': 'doc-en-8',\n",
" 'text': 'In computer programming, is a ternary operator that is part of the syntax for basic conditional expressions in several programming languages. It is commonly referred to as the conditional operator, inline if (iif), or ternary if. An expression evaluates to if the value of is true, and otherwise to . One can read it aloud as \"if a then b otherwise c\".\\n\\nIt originally comes from CPL, in which equivalent syntax for e1 ? e2 : e3 was e1 → e2, e3.\\n\\nAlthough many ternary operators are possible, the conditional operator is so common, and other ternary operators so rare, that the conditional operator is commonly referred to as the ternary operator.\\n\\nVariations\\nThe detailed semantics of \"the\" ternary operator as well as its syntax differs significantly from language to language.\\n\\nA top level distinction from one language to another is whether the expressions permit side effects (as in most procedural languages) and whether the language provides short-circuit evaluation semantics, whereby only the selected expression is evaluated (most standard operators in most languages evaluate all arguments).\\n\\nIf the language supports expressions with side effects but does not specify short-circuit evaluation, then a further distinction exists about which expression evaluates first—if the language guarantees any specific order (bear in mind that the conditional also counts as an expression).\\n\\nFurthermore, if no order is guaranteed, a distinction exists about whether the result is then classified as indeterminate (the value obtained from some order) or undefined (any value at all at the whim of the compiler in the face of side effects, or even a crash).\\n\\nIf the language does not permit side-effects in expressions (common in functional languages), then the order of evaluation has no value semantics—though it may yet bear on whether an infinite recursion terminates, or have other performance implications (in a functional language with match expressions, short-circuit evaluation is inherent, and natural uses for the ternary operator arise less often, so this point is of limited concern).\\n\\nFor these reasons, in some languages the statement form can have subtly different semantics than the block conditional form } (in the C language—the syntax of the example given—these are in fact equivalent).\\n\\nThe associativity of nested ternary operators can also differ from language to language. In almost all languages, the ternary operator is right associative so that evaluates intuitively as , but PHP in particular is notoriously left-associative, and evaluates as follows: , which is rarely what any programmer expects. (The given examples assume that the ternary operator has low operator precedence, which is true in all C-family languages, and many others.)\\n\\nEquivalence to map\\nThe ternary operator can also be viewed as a binary map operation.\\n\\nIn R—and other languages with literal expression tuples—one can simulate the ternary operator with something like the R expression (this idiom is slightly more natural in languages with 0-origin subscripts).\\n\\nHowever, in this idiom it is almost certain that the entire tuple expression will evaluate prior to the subscript expression, so there will be no short-circuit semantics.\\n\\nNested ternaries can be simulated as where the function returns the index of the first true value in the condition vector. Note that both of these map equivalents are binary operators, revealing that the ternary operator is ternary in syntax, rather than semantics. These constructions can be regarded as a weak form of currying based on data concatenation rather than function composition.\\n\\nIf the language provides a mechanism of futures or promises, then short-circuit evaluation can sometimes also be simulated in the context of a binary map operation.\\n\\nConditional assignment\\n is used as follows:\\n\\n condition ? value_if_true : value_if_false\\n\\nThe condition is evaluated true or false as a Boolean expression. On the basis of the evalu
" 'negative_passages': [{'docid': 'doc-en-9',\n",
" 'text': 'The Pirates of Penzance; or, The Slave of Duty is a comic opera in two acts, with music by Arthur Sullivan and libretto by W.\\xa0S.\\xa0Gilbert. The opera\\'s official premiere was at the Fifth Avenue Theatre in New York City on 31 December 1879, where the show was well received by both audiences and critics. Its London debut was on 3 April 1880, at the Opera Comique, where it ran for 363 performances.\\n\\nThe story concerns Frederic, who, having completed his 21st year, is released from his apprenticeship to a band of tender-hearted pirates. He meets the daughters of Major-General Stanley, including Mabel, and the two young people fall instantly in love. Frederic soon learns, however, that he was born on the 29th of February, and so, technically, he has a birthday only once each leap year. His indenture specifies that he remain apprenticed to the pirates until his \"twenty-first birthday\", meaning that he must serve for another 63 years. Bound by his own sense of duty, Frederic\\'s only solace is that Mabel agrees to wait for him faithfully.\\n\\nPirates was the fifth Gilbert and Sullivan collaboration and introduced the much-parodied \"Major-General\\'s Song\". The opera was performed for over a century by the D\\'Oyly Carte Opera Company in Britain and by many other opera companies and repertory companies worldwide. Modernized productions include Joseph Papp\\'s 1981 Broadway production, which ran for 787 performances, winning the Tony Award for Best Revival and the Drama Desk Award for Outstanding Musical, and spawning many imitations and a 1983 film adaptation. Pirates remains popular today, taking its place along with The Mikado and H.M.S. Pinafore as one of the most frequently played Gilbert and Sullivan operas.\\n\\nBackground\\n\\nThe Pirates of Penzance was the only Gilbert and Sullivan opera to have its official premiere in the United States. At the time, American law offered no copyright protection to foreigners. After the pair\\'s previous opera, H.M.S. Pinafore, achieved success in London in 1878, approximately 150 American companies quickly mounted unauthorised productions that often took considerable liberties with the text and paid no royalties to the creators. Gilbert and Sullivan hoped to forestall further \"copyright piracy\" by mounting the first production of their next opera in America, before others could copy it, and by delaying publication of the score and libretto. They succeeded in keeping for themselves the direct profits of the first American production of The Pirates of Penzance by opening the production themselves on Broadway, prior to the London production, and they also operated profitable US touring companies of Pirates and Pinafore. However, Gilbert, Sullivan, and their producer, Richard D\\'Oyly Carte, failed in their efforts, over the next decade, to control the American performance copyrights to Pirates and their other operas.\\n\\nFiction and plays about pirates were ubiquitous in the 19th century. Walter Scott\\'s The Pirate (1822) and James Fenimore Cooper\\'s The Red Rover were key sources for the romanticised, dashing pirate image and the idea of repentant pirates. Both Gilbert and Sullivan had parodied these ideas early in their careers. Sullivan had written a comic opera called The Contrabandista, in 1867, about a hapless British tourist who is captured by bandits and forced to become their chief. Gilbert had written several comic works that involved pirates or bandits. In Gilbert\\'s 1876 opera Princess Toto, the title character is eager to be captured by a brigand chief. Gilbert had translated Jacques Offenbach\\'s operetta Les brigands, in 1871. As in Les brigands, The Pirates of Penzance absurdly treats stealing as a professional career path, with apprentices and tools of the trade such as the crowbar and life preserver.\\n\\nGenesis\\nWhile Pinafore was running strongly at the Opera Comique in London, Gilbert was eager to get started on his and Sullivan\\'s next opera, and he began working on the libretto in December 1878. He re-used several elem
" {'docid': 'doc-en-10',\n",
" 'text': 'Follies is a musical with music and lyrics by Stephen Sondheim and a book by James Goldman.\\n\\nThe story concerns a reunion in a crumbling Broadway theater, scheduled for demolition, of the past performers of the \"Weismann\\'s Follies\", a musical revue (based on the Ziegfeld Follies), that played in that theater between the world wars. It focuses on two couples, Buddy and Sally Durant Plummer and Benjamin and Phyllis Rogers Stone, who are attending the reunion. Sally and Phyllis were showgirls in the Follies. Both couples are deeply unhappy with their marriages. Buddy, a traveling salesman, is having an affair with a girl on the road; Sally is still as much in love with Ben as she was years ago; and Ben is so self-absorbed that Phyllis feels emotionally abandoned. Several of the former showgirls perform their old numbers, sometimes accompanied by the ghosts of their former selves. The musical numbers in the show have been interpreted as pastiches of the styles of the leading Broadway composers of the 1920s and 1930s, and sometimes as parodies of specific songs.\\n\\nThe Broadway production opened on April 4, 1971, directed by Harold Prince and Michael Bennett, and with choreography by Bennett. The musical was nominated for 11 Tony Awards and won seven. The original production, the second-most costly performed on Broadway to that date, ran for over 500 performances but ultimately lost its entire investment. The musical has had a number of major revivals, and several of its songs have become standards, including \"Broadway Baby\", \"I\\'m Still Here\", \"Too Many Mornings\", \"Could I Leave You?\", and \"Losing My Mind\".\\n\\nBackground\\nAfter the failure of Do I Hear a Waltz? (1965), for which he had written the lyrics to Richard Rodgers\\'s music, Sondheim decided that he would henceforth work only on projects where he could write both the music and lyrics himself. He asked author and playwright James Goldman to join him as bookwriter for a new musical. Inspired by a New York Times article about a gathering of former showgirls from the Ziegfeld Follies, they decided upon a story about ex-showgirls.\\n\\nOriginally titled The Girls Upstairs, the musical was to be produced by David Merrick and Leland Hayward in late 1967, but the plans ultimately fell through, and Stuart Ostrow became the producer, with Joseph Hardy as director. These plans also did not work out, and finally Harold Prince, who had worked previously with Sondheim, became the producer and director. He had agreed to work on The Girls Upstairs if Sondheim agreed to work on Company; Michael Bennett, the young choreographer of Company, was also brought onto the project. It was Prince who changed the title to Follies; he was \"intrigued by the psychology of a reunion of old chorus dancers and loved the play on the word \\'follies.\\n\\nPlot\\nIn 1971, on the soon-to-be-demolished stage of the Weismann Theatre, a reunion is being held to honor the Weismann\\'s Follies shows past and the beautiful chorus girls who performed there every year between the two world wars. The once resplendent theater is now little but planks and scaffolding (\"Prologue\"/\"Overture\"). As the ghosts of the young showgirls slowly drift through the theater, a majordomo enters with his entourage of waiters and waitresses. They pass through the spectral showgirls without seeing them.\\n\\nSally Durant Plummer, \"blond, petite, sweet-faced\" and at 49 \"still remarkably like the girl she was thirty years ago\", a former Weismann girl, is the first guest to arrive, and her ghostly youthful counterpart moves towards her. Phyllis Rogers Stone, a stylish and elegant woman, arrives with her husband Ben, a renowned philanthropist and politician. As their younger counterparts approach them, Phyllis comments to Ben about their past. He feigns a lack of interest; there is an underlying tension in their relationship. As more guests arrive, Sally\\'s husband, Buddy, enters. He is a salesman, in his early 50s, appealing and lively, whose smiles cover inner disappointment.\\n\
" {'docid': 'doc-en-11',\n",
" 'text': 'Cleopatra in Space is an American animated television series produced by DreamWorks Animation and animated by Titmouse, Inc., based on the graphic novel series of the same name by Mike Maihack. The showrunners for the series are Doug Langdale and Fitzy Fitzmaurice.\\n\\nIn the United States, the first five episodes were released on NBCUniversal\\'s streaming service Peacock for Xfinity customers on April 15, 2020, making this the first DreamWorks Animation series to be released on a streaming device other than Netflix or Amazon Video. On July 15, 2020, the first season was officially released when the service launched nationwide. Prior to its release in the United States, the series was first broadcast in Southeast Asia on DreamWorks Channel beginning on November 25, 2019. The show is geared toward those between ages 6 and 11. Langdale, in an interview, said that he is attempting to make sure the show is \"accessible to a younger audience,\" even as he doesn\\'t give much thought to what age demographic the show is aiming towards.\\n\\nOn July 15, the show premiered on Peacock, with episodes 15 and 713 of the first season made available to viewers who subscribed to \"Peacock Premium\", and a more limited selection for those who chose a free plan. It was one of the three animated Peacock Originals streaming on the platform, with the other two being season 13 of Curious George and season 2 of Where\\'s Waldo?. The show can only be watched using the streaming service\\'s premium plan. On November 19, 2020, Season 2 premiered on Peacock. On January 14, 2021, Season 3 was released on Peacock. On July 14, 2021, all three seasons were added to Hulu.\\n\\nPlot\\nCleopatra in Space is a comedic adventure focusing on Cleopatra\\'s teenage years, as she deals with the ups and downs of being a high school teenager, after she transported 30,000 years into her future to a planet with Egyptian themes ruled by talking cats, and she is said to be the savior of a galaxy. Cleopatra and her newfound friends work to try and return her to her own time, in Ancient Egypt, as she gains new combat skills in the process. Showrunner Doug Langdale described the show as a \"real move-forward story\" which continues forward without interruption.\\n\\nCharacters\\n\\nMain\\n Cleopatra \"Cleo\" (voiced by Lilimar Hernandez) - The fearless and confident protagonist of the series. The 15-year-old princess of ancient Egypt, whose father is Pharaoh King Ptolemy (Sendhil Ramamurthy), she ends up sucked into a portal that sends her 30,000 years into the future where she learns she is the prophesied \"Savior of the Nile Galaxy\", destined to defeat the evil space tyrant Octavian. She ends up attending the futuristic intergalactic academy named P.Y.R.A.M.I.D. to obtain the proper training and skills to fulfill her role. She is sometimes reckless and impulsive, but has a good heart and wants peace. She has also gained strange and powerful powers from her time-travel, which manifests in pink and can be used to drain energy and project it into energy waves and beams. Lilimar called Cleo a character who is not completely mature or responsible, but a young girl who is on the road to becoming a hero, a person who is courageous and brave, seeing \"a lot of positivity in the world, no matter how dark things seem to be,\" even as she seeks adventure all the time.\\n Akila (voiced by Katie Crown) - A pink-eyed fish girl from another planet and the first of Cleopatra\\'s teammates. She is very friendly and optimistic, but over-enthusiastic. She may have a crush on Brian. She has two moms: Theoda (voiced by Cissy Jones) and Pothina (voiced by Kari Wahlgren), who are scholars at The Savior Institute, use dated social expressions and love their daughter. They are the first confirmed LGBTQ characters in the series.\\n Brian (voiced by Jorge Diaz) - A cyborg teenage boy, and Cleopatra\\'s second teammate. His body is mostly robotic and is also sensitive of the fact that he was transformed into a cyborg. He is rather worrisome, paranoid, nervous, and
" {'docid': 'doc-en-12',\n",
" 'text': 'Impression Products, Inc. v. Lexmark International, Inc., 581 U.S. ___ (2017), is a decision of the Supreme Court of the United States on the exhaustion doctrine in patent law in which the Court held that after the sale of a patented item, the patent holder cannot sue for patent infringement relating to further use of that item, even when in violation of a contract with a customer or imported from outside the United States. The case concerned a patent infringement lawsuit brought by Lexmark against Impression Products, Inc., which bought used ink cartridges, refilled them, replaced a microchip on the cartridge to circumvent a digital rights management scheme, and then resold them. Lexmark argued that as they own several patents related to the ink cartridges, Impression Products was violating their patent rights. The U.S. Supreme Court, reversing a 2016 decision of the Federal Circuit, held that the exhaustion doctrine prevented Lexmark\\'s patent infringement lawsuit, although Lexmark could enforce restrictions on use or resale of its contracts with direct purchasers under regular contract law (but not as a patent infringement lawsuit). Besides printer and ink manufacturers, the decision of the case could affect the markets of high tech consumer goods and prescription drugs.\\n\\nBackground\\n\\nFactual setting\\n\\nLexmark International, Inc. makes and sells printers and toner cartridges for its printers. Lexmark owns a number of patents that cover its cartridges and their use. Lexmark sold the cartridges at issue in this case—some in the United States and some abroad.\\n\\nDomestic sales \\n\\nLexmark\\'s domestic sales were in two categories. A \"Regular Cartridge\" is sold at \"list price\" and confers an absolute title and property right on the buyer. A \"Return Program Cartridge\" is sold at a discount of about 20 percent, and is subject to post-sale restrictions: The buyer may not reuse the cartridge after the toner runs out and may not transfer it to anybody else. The first branch of the case turns on the legal status of these post-sale restrictions.\\n\\nLexmark manufactured the toner cartridges with microchips in them, which send signals to the printers indicating toner level. When the amount of toner in a cartridge falls below a certain level, the printer will not operate with that cartridge. Also, the printer will not operate with a Return Program Cartridge that has been refilled by a third party. Thus, Lexmark\\'s technology prevented violation of the post-sale restriction against refilling the Return Program Cartridges. The Regular Cartridges do not have this anti-refill feature and can therefore be refilled and reused (but they cost 20 percent more).\\n\\n\"To circumvent this technological measure,\" however, \"third parties have \\'hacked\\' the Lexmark microchips. They created their own \"unauthorized replacement\" microchips that, when installed in a Return Program cartridge, fool the printer into allowing reuse of that cartridge. Various companies purchase used Return Program Cartridges from the customers who bought them from Lexmark. They replace the microchips with \"unauthorized replacement\" microchips, refill the cartridges with toner, and sell the \"re-manufactured\" cartridges to resellers such as Impression Products for marketing to consumers for use with Lexmark printers. Lexmark had previously argued in Lexmark International, Inc. v. Static Control Components, Inc. that replacing these microchips violated copyright law and the Digital Millennium Copyright Act (DMCA), but both federal and the Supreme Court have ruled against Lexmark, affirming that replacing the microchips is not in violation of copyright.\\n\\nImported cartridges\\n\\nThe second branch of the case involves cartridges that Lexmark sold outside the US. While some of the foreign-sold cartridges were Regular Cartridges and some were Return Program Cartridges, this branch of the case does not involve any distinction among the two types of imported cartridges.\\n\\nTrial court decision\\n\\nThe district court
" {'docid': 'doc-en-13',\n",
" 'text': 'The Werewolf by Night is the name applied to two fictional characters who are werewolves appearing in American comic books published by Marvel Comics. The Werewolf by Night (usually referred to by other characters simply as the Werewolf) first appeared in Marvel Spotlight #2 (February 1972).\\n\\nPublication history\\nPrior to the formation of the Comics Code Authority in 1954, Marvel\\'s predecessor Atlas Comics published a five-page short story titled \"Werewolf by Night!\" in Marvel Tales #116 (July 1953). With the relaxation of the Comics Code Authority\\'s rules in 1971, it became possible for the first time to publish code-approved comic books with werewolves. The Jack Russell version of Werewolf by Night first appeared in Marvel Spotlight #2 (February 1972) and was based on an idea by Roy Thomas. The series name was suggested by Stan Lee and the initial creative team was Gerry Conway and Mike Ploog, who worked from a plot by Roy and Jeanie Thomas for the first issue. Readers have often pointed out that the lead character\\'s name, Jack Russell, is also a breed of dog. Conway has said that while he cannot remember how he came up with the name, it is unlikely that he was making this canine reference consciously, since he did not own a dog and never lived with one growing up. After the test run in Marvel Spotlight #2-4, the character graduated to his own eponymous series in September 1972. Conway described working on the series as \"a lot of fun\" because the horror genre made a refreshing change from the superhero stories that had been the staple of mainstream comics for years. Werewolf by Night was published for 43 issues and ran through March 1977. During the series\\' run, the editorship could not resist the opportunity to assign one of their most popular writers, Marv Wolfman, to write some stories for the series with a playful note: \"At last -- WEREWOLF -- written by a WOLFMAN.\"\\n\\nIssue #32 (August 1975) contains the first appearance of the Moon Knight. Jack Russell co-starred with Tigra in Giant-Size Creatures #1 (July 1974), which was the first appearance of Greer Grant Nelson as Tigra instead of as the Cat. That series was retitled Giant-Size Werewolf with its second issue. Jack Russell was dormant for most of the 1980s. The character\\'s appearance was radically revamped in Moon Knight #29 (March 1983). He guest-starred in various issues of Spider-Woman, West Coast Avengers, and Doctor Strange: Sorcerer Supreme. The Werewolf by Night was later revived in the pages of Marvel Comics Presents, where he appeared irregularly from 1991-1993. He made regular appearances as a supporting cast member in the pages of Morbius: The Living Vampire from 1993-1995. A letters page in an issue of Morbius mentioned that a Werewolf by Night miniseries by Len Kaminski and James Fry was in the works, but the miniseries was never published. Werewolf by Night vol. 2 ran for six issues in 1998. The series was written by Paul Jenkins and penciled by Leonardo Manco. After the book\\'s cancellation, the story was continued in the pages of Strange Tales, which also featured the Man-Thing. That volume of Strange Tales was canceled after only two issues due to poor sales. In early 2007, Marvel published a one-shot entitled Legion of Monsters: Werewolf by Night, with art by Greg Land. In January 2009, Jack Russell was featured in the four-issue limited series Dead of Night Featuring Werewolf by Night, from Marvel\\'s mature readers MAX imprint. The series was written by Duane Swierczynski, with art by Mico Suayan. He was featured as a member of Morbius\\' Midnight Sons in Marvel Zombies 4 in 2009.\\n\\nA second Werewolf by Night first appeared in the third volume of Werewolf by Night and was created by Taboo of the Black-Eyed Peas, Benjamin Jackendoff, and Scot Eaton.\\n\\nFictional character biography\\n\\nJack Russell\\n\\nWhile reports of lycanthropy (shapeshifting into a werewolf) in the Russoff line stretch back many centuries, the first confirmed manifestation is Grigori Russoff in 1795. Dracula slew Grigo
" {'docid': 'doc-en-14',\n",
" 'text': 'The 2021 NHL Entry Draft was the 59th NHL Entry Draft. The draft was held on July 2324, 2021, delayed by one month from its normally scheduled time of June due to the COVID-19 pandemic and the later-than-normal finish of the 202021 NHL season. It was thus the first draft held in July since 2005. For the second year in a row, the event was held in a remote format, with teams convening via videoconferencing, and Commissioner Gary Bettman announcing the selections in the opening round and deputy commissioner Bill Daly in all subsequent rounds from the NHL Network studios in Secaucus, New Jersey.\\n\\nThe first three selections were Owen Power going to the Buffalo Sabres, Matty Beniers being selected by the Seattle Kraken, and Mason McTavish being picked by the Anaheim Ducks.\\n\\nEligibility\\nIce hockey players born between January 1, 2001, and September 15, 2003, were eligible for selection in the 2021 NHL Entry Draft. Additionally, un-drafted, non-North American players born in 2000 were eligible for the draft; and those players who were drafted in the 2019 NHL Entry Draft, but not signed by an NHL team and who were born after June 30, 2001, were also eligible to re-enter the draft.\\n\\nDraft lottery\\nFrom the 201213 NHL season up to the 202021 NHL season all teams not qualifying for the Stanley Cup playoffs have had a \"weighted\" chance at winning the first overall selection. Beginning with the 201415 NHL season, the league changed the weighting system that was used in previous years. Under the new system, the odds of winning the draft lottery for the four lowest finishing teams in the league decreased, while the odds for the other non-playoff teams increased. The draft lottery took place on June 2, 2021. After changing the number of lottery drawings earlier in the season, the first two picks overall in this draft were awarded by lottery. The Buffalo Sabres and Seattle Kraken won the two draft lotteries that took place on June 2, 2021, giving them the first and second picks overall. Buffalo retained the first pick, while Seattle moved up one spot and Anaheim dropped one spot to third overall.\\n\\nThe expansion Seattle Kraken had the same odds of winning the lottery as the team that finished with the third fewest points (this ended up being the New Jersey Devils). Because the Arizona Coyotes\\' 2021 first-round pick was forfeited as the result of a penalty sanction due to violations of the NHL Combine Testing Policy during the 201920 NHL season, Arizona\\'s lottery odds were instead listed as re-draws.\\n\\n{| class=\"wikitable\"\\n|+ Complete draft position odds\\n! Team\\n! 1st\\n! 2nd\\n! 3rd\\n! 4th\\n! 5th\\n! 6th\\n! 7th\\n! 8th\\n! 9th\\n! 10th\\n! 11th\\n! 12th\\n! 13th\\n! 14th\\n! 15th\\n! 16th\\n|-\\n! Buffalo\\n| style=\"background:#A9D0F5;\"| 16.6% || 15.0% || 68.4% || || || || || || || || || || || || ||\\n|-\\n! Anaheim\\n| 12.1% || 11.7% || style=\"background:#DDDDDD;\"| 26.9% || 49.3% || || || || || || || || || || || ||\\n|-\\n! Seattle\\n| 10.3% || style=\"background:#F5A9BC;\"| 10.2% || 4.7% || 39.3% || 35.6% || || || || || || || || || || ||\\n|-\\n! New Jersey\\n| 10.3% || 10.2% || || style=\"background:#DDDDDD;\"| 11.5% || 43.9% || 24.2% || || || || || || || || || ||\\n|-\\n! Columbus\\n| 8.5% || 8.6% || || || style=\"background:#DDDDDD;\"| 20.6% || 45.8% || 16.5% || || || || || || || || ||\\n|-\\n! Detroit\\n| 7.6% || 7.8% || || || || style=\"background:#DDDDDD;\"| 30.0% || 43.8% || 10.9% || || || || || || || ||\\n|-\\n! San Jose\\n| 6.7% || 6.9% || || || || || style=\"background:#DDDDDD;\"| 39.7% || 39.7% || 6.9% || || || || || || ||\\n|-\\n! Los Angeles\\n| 5.8% || 6.0% || || || || || || style=\"background:#DDDDDD;\"| 49.4% || 34.5% || 4.3% || || || || || ||\\n|-\\n! Vancouver\\n| 5.4% || 5.7% || || || || || || || style=\"background:#DDDDDD;\"| 58.6% || 28.0% || 2.4% || || || || ||\\n|-\\n! Ottawa\\n| 4.5% || 4.8% || || || || || || || || style=\"background:#DDDDDD;\"| 67.7% || 21.8% || 1.2% || || || ||\\n|-\\n! Arizona\\n| 3.1% || 3.3% || || || || || || ||
" {'docid': 'doc-en-15',\n",
" 'text': 'The history of Christianity in Sussex includes all aspects of the Christianity in the region that is now Sussex from its introduction to the present day. Christianity is the most commonly practised religion in Sussex.\\n\\nEarly history\\n\\nAfter the Roman conquest of AD 43, the Celtic society of Sussex became heavily Romanized.\\n\\nThe first written account of Christianity in Britain comes from the early Christian Berber author, Tertullian, writing in the third century, who said that \"Christianity could even be found in Britain.\" Emperor Constantine (AD\\xa0306-337), granted official tolerance to Christianity with the Edict of Milan in AD\\xa0313. Then, in the reign of Emperor Theodosius \"the Great\" (AD\\xa0378395), Christianity was made the official religion of the Roman Empire.\\n\\nWhen Roman rule eventually ceased, Christianity was probably confined to urban communities. At Wiggonholt, on a tributary of the River Arun, a large lead tank with repeated chi-rho motifs was discovered in 1943, the only Roman period artefact in Sussex found with a definite Christian association. It may represent a baptismal font or a container for holy water, or alternatively may have been used by pagans.\\n\\nMedieval\\n\\nSaxon\\n\\nAfter the departure of the Roman army, the Saxons arrived and founded the Kingdom of Sussex in the 5th century, bringing with them their polytheistic religion. The Saxon pagan culture probably caused a reversal of the spread of Christianity. According to Bede, Sussex was the last of the mainland Anglo Saxon kingdoms to be converted.\\n\\nÆðelwealh became Sussex\\'s first Christian king when he married Eafe, the daughter of Wulfhere, the Christian king of Mercia. In 681 St Wilfrid, the exiled Bishop of York, landed at Selsey and is credited with evangelising the local population and founding the church in Sussex. King Æðelwealh granted land to Wilfrid which became the site of Selsey Abbey. The seat of the Sussex bishopric was originally located here before the Normans moved it to Chichester Cathedral in 1075. According to Bede, Sussex was the last area of the country to be converted. However it is unlikely that Sussex was wholly heathen when Wilfrid arrived. Æðelwealh, Sussex\\'s king, had been baptised. Damianus, a South Saxon, was made Bishop of Rochester in the Kingdom of Kent in the 650s; this may indicate earlier missionary work in the first half of the 7th century. At the time of Wilfrid\\'s mission there was a monastery at Bosham containing a few monks led by an Irish monk named Dicul, which was probably part of the Hiberno-Scottish mission of the time. Wilfrid was a champion of Roman customs and it was these customs that were adopted by the church in Sussex rather than the Celtic customs that had taken root in Scotland and Ireland.\\n\\nShortly after Æðelwealh granted land to Wilfrid for the church, Cædwalla of Wessex killed Æðelwealh and conquered Sussex. Christianity in Sussex was put under control of the diocese of Winchester. It was not until c. 715 that Eadberht, Abbot of Selsey was consecrated the first bishop of the South Saxons.\\n\\nSt Lewinna, or St Leofwynn, was a female saint who lived around Seaford, probably at Bishopstone around the 7th century. According to the hagiography of the Secgan Manuscript, Lyminster is the burial place of St Cuthflæd of Lyminster. In the late 7th or early 8th century, St Cuthman, a shepherd who may have been born in Chidham and had been reduced to begging, set out from his home with his disabled mother using a one-wheeled cart. When he reached Steyning he saw a vision and stopped there to build a church. Cuthman was venerated as a saint and his church was in existence by 857 when King Æthelwulf of Wessex was buried there. Steyning was an important religious centre and St Cuthman\\'s grave became a place of pilgrimage in the 10th and 11th centuries. In 681, Bede records that an outbreak of the plague had devastated parts of England, including Sussex, and the monks at Selsey Abbey fasted and prayed for three days
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset['dev'][0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each passage in the corpus has two parts: `docid` and `text`. `docid` has the form of `doc-<language>-<id>`"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"corpus = load_dataset('Shitao/MLDR', f\"corpus-{lang}\", trust_remote_code=True)['corpus']"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'docid': 'doc-en-9633',\n",
" 'text': 'Mars Hill Church was a Christian megachurch, founded by Mark Driscoll, Lief Moi, and Mike Gunn. It was a multi-site church based in Seattle, Washington and grew from a home Bible study to 15 locations in 4 U.S. states. Services were offered at its 15 locations; the church also podcast content of weekend services, and of conferences, on the Internet with more than 260,000 sermon views online every week. In 2013, Mars Hill had a membership of 6,489 and average weekly attendance of 12,329. Following controversy in 2014 involving founding pastor Mark Driscoll, attendance dropped to 8,0009,000 people per week.\\n\\nAt the end of September, 2014, an investigation by the church elders found \"bullying\" and \"patterns of persistent sinful behavior\" by Driscoll. The church elders crafted a \"restoration\" plan to help Driscoll and save the church. Instead, Driscoll declined the restoration plan and resigned. On October 31, 2014, lead pastor Dave Bruskas announced plans to dissolve the church\\'s 13 remaining campuses into autonomous entities, with the option of continuing, merging with other congregations, or disbanding, effective January 1, 2015. The Mars Hill network dissolved on January 1, 2015.\\n\\nHistory\\n\\nEarly years \\nMars Hill Church was founded in spring 1996 by Mark Driscoll, Lief Moi and Mike Gunn. The church started at the rental house of Driscoll and his wife Grace with the blessing of Antioch Bible Church and the exodus of about 30 of its students. They outgrew the apartment and started meeting in the youth rooms of another church. The church had its first official service October 1996, with 160 people attending; attendance quickly fell to around 60 because of discussions about the visions and mission of the church.\\n\\nIn the spring of 1997, the church expanded to two evening services. The transition to two different congregations resulted in some anxiety and stir by members who didn\\'t want the church to grow bigger, but it resulted in growing attendance. Later that same year Mark Driscoll was invited to speak at a pastors\\' conference in California. Driscoll\\'s speech influenced the emerging church movement, and changed the focus from reaching Generation X to reaching the postmodern world. The speech resulted in media coverage of Mars Hill Church and Mark Driscoll, and put Driscoll in connection with Leadership Network.\\n\\nThe church continued growing. Inspired by Alan Roxburgh, Driscoll settled on an emerging and missional ecclesiology, and a complementarian view on women in ministry. The church installed the first team of elders and they took over much of the work teaching classes, counseling and training new leaders. Furthermore, the church started a course for new members, called the Gospel Class, to ensure that members were focused on the mission of the church and that they agreed with the central doctrinal statements of the church. The class had been running every quarter since. In the fall of 1999 the church had grown to 350 in attendance every week and was able to pay Driscoll full-time. Prior to 1999, Driscoll operated as an unpaid pastor for three years.\\n\\nMultisite church \\n\\nIn 2003, Mars Hill Church moved into a renovated hardware store in the Ballard neighborhood of Seattle. In 2006, in an effort to reduce the overcrowding at its services, Mars Hill opened its first satellite campus in Shoreline. This change also marked their transition to a multi-site church, using video sermons and other multimedia improvements to the church\\'s web site to connect the campuses. Later in 2006 Mars Hill acquired two new properties in West Seattle and Wedgwood, which became their West Seattle and Lake City campuses.\\n\\nSince then, new Mars Hill locations were added using a multi-campus \"meta-church\" structure, connecting Driscoll\\'s sermons via high-definition video to the remote campuses during weekly worship services. This format allowed each location to retain local leadership and ministries while under the leadership of the main campus. A fourth and fifth Mars Hill
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corpus[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we process the ids and text of queries and corpus for preparation of embedding and searching."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"corpus_ids = corpus['docid']\n",
"corpus_text = corpus['text']\n",
"\n",
"queries_ids = dataset['dev']['query_id']\n",
"queries_text = dataset['dev']['query']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Evaluate from scratch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 Embedding"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the demo we use bge-base-en-v1.5, feel free to change to the model you prefer."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"import os \n",
"os.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = 'true'\n",
"os.environ['CUDA_VISIBLE_DEVICES'] = '0'"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 60.08it/s]\n",
"pre tokenize: 100%|██████████| 782/782 [02:22<00:00, 5.50it/s]\n",
"Inference Embeddings: 100%|██████████| 782/782 [02:47<00:00, 4.66it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape of the embeddings: (200000, 768)\n",
"data type of the embeddings: float16\n"
]
}
],
"source": [
"from FlagEmbedding import FlagModel\n",
"\n",
"# get the BGE embedding model\n",
"model = FlagModel('BAAI/bge-base-en-v1.5',)\n",
" # query_instruction_for_retrieval=\"Represent this sentence for searching relevant passages:\")\n",
"\n",
"# get the embedding of the queries and corpus\n",
"queries_embeddings = model.encode_queries(queries_text)\n",
"corpus_embeddings = model.encode_corpus(corpus_text)\n",
"\n",
"print(\"shape of the embeddings:\", corpus_embeddings.shape)\n",
"print(\"data type of the embeddings: \", corpus_embeddings.dtype)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Indexing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a Faiss index to store the embeddings."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total number of vectors: 200000\n"
]
}
],
"source": [
"import faiss\n",
"import numpy as np\n",
"\n",
"# get the length of our embedding vectors, vectors by bge-base-en-v1.5 have length 768\n",
"dim = corpus_embeddings.shape[-1]\n",
"\n",
"# create the faiss index and store the corpus embeddings into the vector space\n",
"index = faiss.index_factory(dim, 'Flat', faiss.METRIC_INNER_PRODUCT)\n",
"corpus_embeddings = corpus_embeddings.astype(np.float32)\n",
"# train and add the embeddings to the index\n",
"index.train(corpus_embeddings)\n",
"index.add(corpus_embeddings)\n",
"\n",
"print(f\"total number of vectors: {index.ntotal}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Searching"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the Faiss index to search answers for each query."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Searching: 100%|██████████| 7/7 [00:01<00:00, 5.15it/s]\n"
]
}
],
"source": [
"from tqdm import tqdm\n",
"\n",
"query_size = len(queries_embeddings)\n",
"\n",
"all_scores = []\n",
"all_indices = []\n",
"\n",
"for i in tqdm(range(0, query_size, 32), desc=\"Searching\"):\n",
" j = min(i + 32, query_size)\n",
" query_embedding = queries_embeddings[i: j]\n",
" score, indice = index.search(query_embedding.astype(np.float32), k=100)\n",
" all_scores.append(score)\n",
" all_indices.append(indice)\n",
"\n",
"all_scores = np.concatenate(all_scores, axis=0)\n",
"all_indices = np.concatenate(all_indices, axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"results = {}\n",
"for idx, (scores, indices) in enumerate(zip(all_scores, all_indices)):\n",
" results[queries_ids[idx]] = {}\n",
" for score, index in zip(scores, indices):\n",
" if index != -1:\n",
" results[queries_ids[idx]][corpus_ids[index]] = float(score)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4 Evaluating"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Process the qrels into a dictionary with qid-docid pairs."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"qrels_dict = {}\n",
"for data in dataset['dev']:\n",
" qid = str(data[\"query_id\"])\n",
" if qid not in qrels_dict:\n",
" qrels_dict[qid] = {}\n",
" for doc in data[\"positive_passages\"]:\n",
" docid = str(doc[\"docid\"])\n",
" qrels_dict[qid][docid] = 1\n",
" for doc in data[\"negative_passages\"]:\n",
" docid = str(doc[\"docid\"])\n",
" qrels_dict[qid][docid] = 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, use [pytrec_eval](https://github.com/cvangysel/pytrec_eval) library to help us calculate the scores of selected metrics:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"defaultdict(<class 'list'>, {'NDCG@10': 0.35304, 'NDCG@100': 0.38694})\n",
"defaultdict(<class 'list'>, {'Recall@10': 0.465, 'Recall@100': 0.625})\n"
]
}
],
"source": [
"import pytrec_eval\n",
"from collections import defaultdict\n",
"\n",
"ndcg_string = \"ndcg_cut.\" + \",\".join([str(k) for k in [10,100]])\n",
"recall_string = \"recall.\" + \",\".join([str(k) for k in [10,100]])\n",
"\n",
"evaluator = pytrec_eval.RelevanceEvaluator(\n",
" qrels_dict, {ndcg_string, recall_string}\n",
")\n",
"scores = evaluator.evaluate(results)\n",
"\n",
"all_ndcgs, all_recalls = defaultdict(list), defaultdict(list)\n",
"for query_id in scores.keys():\n",
" for k in [10,100]:\n",
" all_ndcgs[f\"NDCG@{k}\"].append(scores[query_id][\"ndcg_cut_\" + str(k)])\n",
" all_recalls[f\"Recall@{k}\"].append(scores[query_id][\"recall_\" + str(k)])\n",
"\n",
"ndcg, recall = (\n",
" all_ndcgs.copy(),\n",
" all_recalls.copy(),\n",
")\n",
"\n",
"for k in [10,100]:\n",
" ndcg[f\"NDCG@{k}\"] = round(sum(ndcg[f\"NDCG@{k}\"]) / len(scores), 5)\n",
" recall[f\"Recall@{k}\"] = round(sum(recall[f\"Recall@{k}\"]) / len(scores), 5)\n",
"\n",
"print(ndcg)\n",
"print(recall)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Evaluate using FlagEmbedding"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We provide independent evaluation for popular datasets and benchmarks. Try the following code to run the evaluation, or run the shell script provided in [example](../../examples/evaluation/mldr/eval_mldr.sh) folder."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"arguments = \"\"\"- \\\n",
" --eval_name mldr \\\n",
" --dataset_dir ./mldr/data \\\n",
" --dataset_names en \\\n",
" --splits dev \\\n",
" --corpus_embd_save_dir ./mldr/corpus_embd \\\n",
" --output_dir ./mldr/search_results \\\n",
" --search_top_k 1000 \\\n",
" --cache_path ./cache/data \\\n",
" --overwrite False \\\n",
" --k_values 10 100 \\\n",
" --eval_output_method markdown \\\n",
" --eval_output_path ./mldr/mldr_eval_results.md \\\n",
" --eval_metrics ndcg_at_10 \\\n",
" --embedder_name_or_path BAAI/bge-base-en-v1.5 \\\n",
" --devices cuda:0 cuda:1 \\\n",
" --embedder_batch_size 1024\n",
"\"\"\".replace('\\n','')\n",
"\n",
"sys.argv = arguments.split()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/anaconda3/envs/dev/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"initial target device: 100%|██████████| 2/2 [00:07<00:00, 3.54s/it]\n",
"pre tokenize: 100%|██████████| 98/98 [01:01<00:00, 1.58it/s]\n",
"You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"pre tokenize: 100%|██████████| 98/98 [01:07<00:00, 1.44it/s]09it/s]\n",
"You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"Inference Embeddings: 100%|██████████| 98/98 [01:22<00:00, 1.19it/s]\n",
"Inference Embeddings: 100%|██████████| 98/98 [01:23<00:00, 1.17it/s]\n",
"Chunks: 100%|██████████| 2/2 [02:40<00:00, 80.21s/it] \n",
"pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 2.16it/s]\n",
"pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 2.21it/s]\n",
"Chunks: 100%|██████████| 2/2 [00:01<00:00, 1.13it/s]\n",
"Searching: 100%|██████████| 7/7 [00:01<00:00, 6.79it/s]\n",
"Qrels not found in ./mldr/data/en/dev_qrels.jsonl. Trying to download the qrels from the remote and save it to ./mldr/data/en.\n",
"Loading and Saving qrels: 100%|██████████| 200/200 [00:00<00:00, 598.03it/s]\n"
]
}
],
"source": [
"from transformers import HfArgumentParser\n",
"\n",
"from FlagEmbedding.evaluation.mldr import (\n",
" MLDREvalArgs, MLDREvalModelArgs,\n",
" MLDREvalRunner\n",
")\n",
"\n",
"\n",
"parser = HfArgumentParser((\n",
" MLDREvalArgs,\n",
" MLDREvalModelArgs\n",
"))\n",
"\n",
"eval_args, model_args = parser.parse_args_into_dataclasses()\n",
"eval_args: MLDREvalArgs\n",
"model_args: MLDREvalModelArgs\n",
"\n",
"runner = MLDREvalRunner(\n",
" eval_args=eval_args,\n",
" model_args=model_args\n",
")\n",
"\n",
"runner.run()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"en-dev\": {\n",
" \"ndcg_at_10\": 0.35304,\n",
" \"ndcg_at_100\": 0.38694,\n",
" \"map_at_10\": 0.31783,\n",
" \"map_at_100\": 0.32469,\n",
" \"recall_at_10\": 0.465,\n",
" \"recall_at_100\": 0.625,\n",
" \"precision_at_10\": 0.0465,\n",
" \"precision_at_100\": 0.00625,\n",
" \"mrr_at_10\": 0.31783,\n",
" \"mrr_at_100\": 0.32469\n",
" }\n",
"}\n"
]
}
],
"source": [
"with open('mldr/search_results/bge-base-en-v1.5/NoReranker/EVAL/eval_results.json', 'r') as content_file:\n",
" print(content_file.read())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "dev",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}