haystack/examples/web_lfqa_improved.py

import logging
import os
from typing import Dict, Any

from haystack import Pipeline
from haystack.nodes import PromptNode, PromptTemplate, TopPSampler
from haystack.nodes.ranker.diversity import DiversityRanker
from haystack.nodes.ranker.lost_in_the_middle import LostInTheMiddleRanker
from haystack.nodes.retriever.web import WebRetriever

search_key = os.environ.get("SERPERDEV_API_KEY")
if not search_key:
    raise ValueError("Please set the SERPERDEV_API_KEY environment variable")

models_config: Dict[str, Any] = {
    "openai": {"api_key": os.environ.get("OPENAI_API_KEY"), "model_name": "gpt-3.5-turbo"},
    "anthropic": {"api_key": os.environ.get("ANTHROPIC_API_KEY"), "model_name": "claude-instant-1"},
    "hf": {"api_key": os.environ.get("HF_API_KEY"), "model_name": "tiiuae/falcon-7b-instruct"},
}
prompt_text = """
Synthesize a comprehensive answer from the provided paragraphs and the given question.\n
Answer in full sentences and paragraphs, don't use bullet points or lists.\n
If the answer includes multiple chronological events, order them chronologically.\n
\n\n Paragraphs: {join(documents)} \n\n Question: {query} \n\n Answer:
"""

stream = True
model: Dict[str, str] = models_config["openai"]
prompt_node = PromptNode(
    model["model_name"],
    default_prompt_template=PromptTemplate(prompt_text),
    api_key=model["api_key"],
    max_length=768,
    model_kwargs={"stream": stream},
)

web_retriever = WebRetriever(api_key=search_key, top_search_results=5, mode="preprocessed_documents", top_k=50)

sampler = TopPSampler(top_p=0.97)
diversity_ranker = DiversityRanker()
litm_ranker = LostInTheMiddleRanker(word_count_threshold=1024)

pipeline = Pipeline()
pipeline.add_node(component=web_retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=sampler, name="Sampler", inputs=["Retriever"])
pipeline.add_node(component=diversity_ranker, name="DiversityRanker", inputs=["Sampler"])
pipeline.add_node(component=litm_ranker, name="LostInTheMiddleRanker", inputs=["DiversityRanker"])
pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["LostInTheMiddleRanker"])

logging.disable(logging.CRITICAL)


questions = [
    "What are the main reasons for long-standing animosities between Russia and Poland?",
    "What are the primary causes and effects of climate change on global and local scales?",
    "What were the key events and influences that led to Renaissance; how did these developments "
    "shape modern Western culture?",
    "How have advances in technology in the 21st century affected job markets and economies around the world?",
    "What are the main reasons behind the Israel-Palestine conflict and how have they evolved over time?",
    "How has the European Union influenced the political, economic, and social dynamics of Europe?",
]

print(f"\nRunning pipeline with {model['model_name']}\n")

for q in questions:
    print(f"\nQuestion: {q}")
    if stream:
        print("Answer:")
    response = pipeline.run(query=q)
    if not stream:
        print(f"Answer: {response['results'][0]}")
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00			`import logging`
			`import os`
feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`from typing import Dict, Any`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00
			`from haystack import Pipeline`
feat: Add LostInTheMiddleRanker (#5457) * Add lost in the middle ranker * Add release note * Julian's feedback: more precise version of truncate * Better comments for the litm algorithm * Sebastian PR feedback * Add check for invalid values of word_count_threshold * Remove _truncate as it is not needed any more --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com> 2023-08-02 17:05:13 +02:00			`from haystack.nodes import PromptNode, PromptTemplate, TopPSampler`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00			`from haystack.nodes.ranker.diversity import DiversityRanker`
feat: Add LostInTheMiddleRanker (#5457) * Add lost in the middle ranker * Add release note * Julian's feedback: more precise version of truncate * Better comments for the litm algorithm * Sebastian PR feedback * Add check for invalid values of word_count_threshold * Remove _truncate as it is not needed any more --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com> 2023-08-02 17:05:13 +02:00			`from haystack.nodes.ranker.lost_in_the_middle import LostInTheMiddleRanker`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00			`from haystack.nodes.retriever.web import WebRetriever`

			`search_key = os.environ.get("SERPERDEV_API_KEY")`
			`if not search_key:`
			`raise ValueError("Please set the SERPERDEV_API_KEY environment variable")`

feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`models_config: Dict[str, Any] = {`
			`"openai": {"api_key": os.environ.get("OPENAI_API_KEY"), "model_name": "gpt-3.5-turbo"},`
			`"anthropic": {"api_key": os.environ.get("ANTHROPIC_API_KEY"), "model_name": "claude-instant-1"},`
			`"hf": {"api_key": os.environ.get("HF_API_KEY"), "model_name": "tiiuae/falcon-7b-instruct"},`
			`}`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00			`prompt_text = """`
feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`Synthesize a comprehensive answer from the provided paragraphs and the given question.\n`
			`Answer in full sentences and paragraphs, don't use bullet points or lists.\n`
			`If the answer includes multiple chronological events, order them chronologically.\n`
			`\n\n Paragraphs: {join(documents)} \n\n Question: {query} \n\n Answer:`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00			`"""`

feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`stream = True`
			`model: Dict[str, str] = models_config["openai"]`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00			`prompt_node = PromptNode(`
feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`model["model_name"],`
			`default_prompt_template=PromptTemplate(prompt_text),`
			`api_key=model["api_key"],`
			`max_length=768,`
			`model_kwargs={"stream": stream},`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00			`)`

feat: Add LostInTheMiddleRanker (#5457) * Add lost in the middle ranker * Add release note * Julian's feedback: more precise version of truncate * Better comments for the litm algorithm * Sebastian PR feedback * Add check for invalid values of word_count_threshold * Remove _truncate as it is not needed any more --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com> 2023-08-02 17:05:13 +02:00			`web_retriever = WebRetriever(api_key=search_key, top_search_results=5, mode="preprocessed_documents", top_k=50)`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00
feat: Add LostInTheMiddleRanker (#5457) * Add lost in the middle ranker * Add release note * Julian's feedback: more precise version of truncate * Better comments for the litm algorithm * Sebastian PR feedback * Add check for invalid values of word_count_threshold * Remove _truncate as it is not needed any more --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com> 2023-08-02 17:05:13 +02:00			`sampler = TopPSampler(top_p=0.97)`
			`diversity_ranker = DiversityRanker()`
			`litm_ranker = LostInTheMiddleRanker(word_count_threshold=1024)`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00
			`pipeline = Pipeline()`
			`pipeline.add_node(component=web_retriever, name="Retriever", inputs=["Query"])`
			`pipeline.add_node(component=sampler, name="Sampler", inputs=["Retriever"])`
feat: Add LostInTheMiddleRanker (#5457) * Add lost in the middle ranker * Add release note * Julian's feedback: more precise version of truncate * Better comments for the litm algorithm * Sebastian PR feedback * Add check for invalid values of word_count_threshold * Remove _truncate as it is not needed any more --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com> 2023-08-02 17:05:13 +02:00			`pipeline.add_node(component=diversity_ranker, name="DiversityRanker", inputs=["Sampler"])`
			`pipeline.add_node(component=litm_ranker, name="LostInTheMiddleRanker", inputs=["DiversityRanker"])`
			`pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["LostInTheMiddleRanker"])`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00
feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`logging.disable(logging.CRITICAL)`


			`questions = [`
			`"What are the main reasons for long-standing animosities between Russia and Poland?",`
			`"What are the primary causes and effects of climate change on global and local scales?",`
			`"What were the key events and influences that led to Renaissance; how did these developments "`
			`"shape modern Western culture?",`
			`"How have advances in technology in the 21st century affected job markets and economies around the world?",`
			`"What are the main reasons behind the Israel-Palestine conflict and how have they evolved over time?",`
			`"How has the European Union influenced the political, economic, and social dynamics of Europe?",`
			`]`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00
feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`print(f"\nRunning pipeline with {model['model_name']}\n")`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00
			`for q in questions:`
feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`print(f"\nQuestion: {q}")`
			`if stream:`
			`print("Answer:")`
feat: Add DiversityRanker (#5398) * Introduce DiversityRanker * improve most_diverse_order speed * Compute mean for numerical stability * Add release note * Add cosine similarity * Test both dot product and cosine similarity * Add pydocs hook --------- Co-authored-by: Michel Bartels <login@michelbartels.com> 2023-08-01 12:48:34 +02:00			`response = pipeline.run(query=q)`
feat: Improve LFQA Web Example (#5504) * Improve web_lfqa example * Turn off pylint for logging setup * Another way to turn off logging 2023-08-04 14:20:06 +02:00			`if not stream:`
			`print(f"Answer: {response['results'][0]}")`