mirror of
				https://github.com/deepset-ai/haystack.git
				synced 2025-10-22 05:19:02 +00:00 
			
		
		
		
	refactor: Adapt running benchmarks (#5007)
* Generate eval result in separate method * Adapt benchmarking utils * Adapt running retriever benchmarks * Adapt error message * Adapt running reader benchmarks * Adapt retriever reader benchmark script * Adapt running benchmarks script * Adapt README.md * Raise error if file doesn't exist * Raise error if path doesn't exist or is a directory * minor readme update * Create separate methods for checking if pipeline contains reader or retriever * Fix reader pipeline case --------- Co-authored-by: Darja Fokina <daria.f93@gmail.com>
This commit is contained in:
		
							parent
							
								
									2ede4d1d1d
								
							
						
					
					
						commit
						b8ff1052d4
					
				| @ -1,45 +1,97 @@ | |||||||
| # Benchmarks | # Benchmarks | ||||||
| 
 | 
 | ||||||
|  | The tooling provided in this directory allows running benchmarks on reader pipelines, retriever pipelines, | ||||||
|  | and retriever-reader pipelines. | ||||||
| 
 | 
 | ||||||
|  | ## Defining configuration | ||||||
| 
 | 
 | ||||||
| To start all benchmarks (e.g. for a new Haystack release), run: | To run a benchmark, you need to create a configuration file first. This file should be a Pipeline YAML file that | ||||||
|  | contains both the querying and, optionally, the indexing pipeline, in case the querying pipeline includes a retriever. | ||||||
| 
 | 
 | ||||||
| ```` | The configuration file should also have a **`benchmark_config`** section that includes the following information: | ||||||
| python run.py --reader --retriever_index --retriever_query --update_json --save_markdown |  | ||||||
| ```` |  | ||||||
| 
 | 
 | ||||||
| For custom runs, you can specify which components and processes to benchmark with the following flags: | - **`labels_file`**: The path to a SQuAD-formatted JSON or CSV file that contains the labels to be benchmarked on. | ||||||
| ``` | - **`documents_directory`**: The path to a directory containing files intended to be indexed into the document store. | ||||||
| python run.py [--reader] [--retriever_index] [--retriever_query] [--ci] [--update_json] [--save_markdown] |                              This is only necessary for retriever and retriever-reader pipelines. | ||||||
|  | - **`data_url`**: This is optional. If provided, the benchmarking script will download data from this URL and | ||||||
|  |                   save it in the **`data/`** directory. | ||||||
| 
 | 
 | ||||||
| where | Here is an example of how a configuration file for a retriever-reader pipeline might look like: | ||||||
| 
 | 
 | ||||||
| **--reader** will trigger the speed and accuracy benchmarks for the reader. Here we simply use the SQuAD dev set. | ```yaml | ||||||
|  | components: | ||||||
|  |   - name: DocumentStore | ||||||
|  |     type: ElasticsearchDocumentStore | ||||||
|  |   - name: TextConverter | ||||||
|  |     type: TextConverter | ||||||
|  |   - name: Reader | ||||||
|  |     type: FARMReader | ||||||
|  |     params: | ||||||
|  |       model_name_or_path: deepset/roberta-base-squad2-distilled | ||||||
|  |   - name: Retriever | ||||||
|  |     type: BM25Retriever | ||||||
|  |     params: | ||||||
|  |       document_store: DocumentStore | ||||||
|  |       top_k: 10 | ||||||
| 
 | 
 | ||||||
| **--retriever_index** will trigger indexing benchmarks | pipelines: | ||||||
|  |   - name: indexing | ||||||
|  |     nodes: | ||||||
|  |       - name: TextConverter | ||||||
|  |         inputs: [File] | ||||||
|  |       - name: Retriever | ||||||
|  |         inputs: [TextConverter] | ||||||
|  |       - name: DocumentStore | ||||||
|  |         inputs: [Retriever] | ||||||
|  |   - name: querying | ||||||
|  |     nodes: | ||||||
|  |       - name: Retriever | ||||||
|  |         inputs: [Query] | ||||||
|  |       - name: Reader | ||||||
|  |         inputs: [Retriever] | ||||||
| 
 | 
 | ||||||
| **--retriever_query** will trigger querying benchmarks (embeddings will be loaded from file instead of being computed on the fly) | benchmark_config: | ||||||
| 
 |   data_url: http://example.com/data.tar.gz | ||||||
| **--ci** will cause the the benchmarks to run on a smaller slice of each dataset and a smaller subset of Retriever / Reader / DocStores.  |   documents_directory: /path/to/documents | ||||||
| 
 |   labels_file: /path/to/labels.csv | ||||||
| **--update-json** will cause the script to update the json files in docs/_src/benchmarks so that the website benchmarks will be updated. |  | ||||||
|   |  | ||||||
| **--save_markdown** save results additionally to the default csv also as a markdown file |  | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| Results will be stored in this directory as | ## Running benchmarks | ||||||
| - retriever_index_results.csv and retriever_index_results.md |  | ||||||
| - retriever_query_results.csv and retriever_query_results.md |  | ||||||
| - reader_results.csv and reader_results.md |  | ||||||
| 
 | 
 | ||||||
|  | Once you have your configuration file, you can run benchmarks by using the **`run.py`** script. | ||||||
| 
 | 
 | ||||||
| # Temp. Quickfix for bigger runs | ```bash | ||||||
|  | python run.py [--output OUTPUT] config | ||||||
|  | ``` | ||||||
| 
 | 
 | ||||||
| For bigger indexing runs (500k docs) the standard elastic / opensearch container that we spawn via haystack might run OOM.  | The script takes the following arguments: | ||||||
| Therefore, start them manually before you trigger the benchmark script and assign more memory to them:  |  | ||||||
| 
 | 
 | ||||||
| `docker start opensearch > /dev/null 2>&1 || docker run -d -p 9201:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_JAVA_OPTS=-Xms4096m -Xmx4096m" --name opensearch opensearchproject/opensearch:2.2.1` | - `config`: This is the path to your configuration file. | ||||||
|  | - `--output`: This is an optional path where benchmark results should be saved. If not provided, the script will create a JSON file with the same name as the specified config file. | ||||||
| 
 | 
 | ||||||
| and | ## Metrics | ||||||
| 
 | 
 | ||||||
| `docker start elasticsearch > /dev/null 2>&1 || docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms4096m -Xmx4096m" --name elasticsearch elasticsearch:7.9.2` | The benchmarks yield the following metrics: | ||||||
|  | 
 | ||||||
|  | - Reader pipelines: | ||||||
|  |     - Exact match score | ||||||
|  |     - F1 score | ||||||
|  |     - Total querying time | ||||||
|  |     - Seconds/query | ||||||
|  | - Retriever pipelines: | ||||||
|  |     - Recall | ||||||
|  |     - Mean-average precision | ||||||
|  |     - Total querying time | ||||||
|  |     - Seconds/query | ||||||
|  |     - Queries/second | ||||||
|  |     - Total indexing time | ||||||
|  |     - Number of indexed Documents/second | ||||||
|  | - Retriever-Reader pipelines: | ||||||
|  |     - Exact match score | ||||||
|  |     - F1 score | ||||||
|  |     - Total querying time | ||||||
|  |     - Seconds/query | ||||||
|  |     - Total indexing time | ||||||
|  |     - Number of indexed Documents/second | ||||||
|  | 
 | ||||||
|  | You can find more details about the performance metrics in our [evaluation guide](https://docs.haystack.deepset.ai/docs/evaluation). | ||||||
|  | |||||||
| @ -1,51 +1,77 @@ | |||||||
| # The benchmarks use | from pathlib import Path | ||||||
| # - a variant of the Natural Questions Dataset (https://ai.google.com/research/NaturalQuestions) from Google Research | from typing import Dict | ||||||
| #   licensed under CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/) |  | ||||||
| # - the SQuAD 2.0 Dataset (https://rajpurkar.github.io/SQuAD-explorer/) from  Rajpurkar et al. |  | ||||||
| #   licensed under  CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/legalcode) |  | ||||||
| 
 |  | ||||||
| from retriever import benchmark_indexing, benchmark_querying |  | ||||||
| from reader import benchmark_reader |  | ||||||
| from utils import load_config |  | ||||||
| import argparse | import argparse | ||||||
|  | import json | ||||||
|  | 
 | ||||||
|  | from haystack import Pipeline | ||||||
|  | from haystack.nodes import BaseRetriever, BaseReader | ||||||
|  | from haystack.pipelines.config import read_pipeline_config_from_yaml | ||||||
|  | 
 | ||||||
|  | from utils import prepare_environment, contains_reader, contains_retriever | ||||||
|  | from reader import benchmark_reader | ||||||
|  | from retriever import benchmark_retriever | ||||||
|  | from retriever_reader import benchmark_retriever_reader | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| parser = argparse.ArgumentParser() | def run_benchmark(pipeline_yaml: Path) -> Dict: | ||||||
|  |     """ | ||||||
|  |     Run benchmarking on a given pipeline. Pipeline can be a retriever, reader, or retriever-reader pipeline. | ||||||
|  |     In case of retriever or retriever-reader pipelines, indexing is also benchmarked, so the config file must | ||||||
|  |     contain an indexing pipeline as well. | ||||||
| 
 | 
 | ||||||
| parser.add_argument("--reader", default=False, action="store_true", help="Perform Reader benchmarks") |     :param pipeline_yaml: Path to pipeline YAML config. The config file should contain a benchmark_config section where | ||||||
| parser.add_argument( |                           the following parameters are specified: | ||||||
|     "--retriever_index", default=False, action="store_true", help="Perform Retriever indexing benchmarks" |                             - documents_directory: Directory containing files to index. | ||||||
| ) |                             - labels_file: Path to evaluation set. | ||||||
| parser.add_argument( |                             - data_url (optional): URL to download the data from. Downloaded data will be stored in | ||||||
|     "--retriever_query", default=False, action="store_true", help="Perform Retriever querying benchmarks" |                                                    the directory `data/`. | ||||||
| ) |     """ | ||||||
| parser.add_argument( |     pipeline_config = read_pipeline_config_from_yaml(pipeline_yaml) | ||||||
|     "--ci", default=False, action="store_true", help="Perform a smaller subset of benchmarks that are quicker to run" |     benchmark_config = pipeline_config.pop("benchmark_config", {}) | ||||||
| ) |  | ||||||
| parser.add_argument( |  | ||||||
|     "--update_json", |  | ||||||
|     default=False, |  | ||||||
|     action="store_true", |  | ||||||
|     help="Update the json file with the results of this run so that the website can be updated", |  | ||||||
| ) |  | ||||||
| parser.add_argument( |  | ||||||
|     "--save_markdown", |  | ||||||
|     default=False, |  | ||||||
|     action="store_true", |  | ||||||
|     help="Update the json file with the results of this run so that the website can be updated", |  | ||||||
| ) |  | ||||||
| args = parser.parse_args() |  | ||||||
| 
 | 
 | ||||||
| # load config |     # Prepare environment | ||||||
| params, filenames = load_config(config_filename="config.json", ci=args.ci) |     prepare_environment(pipeline_config, benchmark_config) | ||||||
|  |     labels_file = Path(benchmark_config["labels_file"]) | ||||||
| 
 | 
 | ||||||
| if args.retriever_index: |     querying_pipeline = Pipeline.load_from_config(pipeline_config, pipeline_name="querying") | ||||||
|     benchmark_indexing( |     pipeline_contains_reader = contains_reader(querying_pipeline) | ||||||
|         **params, **filenames, ci=args.ci, update_json=args.update_json, save_markdown=args.save_markdown |     pipeline_contains_retriever = contains_retriever(querying_pipeline) | ||||||
|     ) | 
 | ||||||
| if args.retriever_query: |     # Retriever-Reader pipeline | ||||||
|     benchmark_querying( |     if pipeline_contains_retriever and pipeline_contains_reader: | ||||||
|         **params, **filenames, ci=args.ci, update_json=args.update_json, save_markdown=args.save_markdown |         documents_dir = Path(benchmark_config["documents_directory"]) | ||||||
|     ) |         indexing_pipeline = Pipeline.load_from_config(pipeline_config, pipeline_name="indexing") | ||||||
| if args.reader: | 
 | ||||||
|     benchmark_reader(**params, **filenames, ci=args.ci, update_json=args.update_json, save_markdown=args.save_markdown) |         results = benchmark_retriever_reader(indexing_pipeline, querying_pipeline, documents_dir, labels_file) | ||||||
|  | 
 | ||||||
|  |     # Retriever pipeline | ||||||
|  |     elif pipeline_contains_retriever: | ||||||
|  |         documents_dir = Path(benchmark_config["documents_directory"]) | ||||||
|  |         indexing_pipeline = Pipeline.load_from_config(pipeline_config, pipeline_name="indexing") | ||||||
|  | 
 | ||||||
|  |         results = benchmark_retriever(indexing_pipeline, querying_pipeline, documents_dir, labels_file) | ||||||
|  | 
 | ||||||
|  |     # Reader pipeline | ||||||
|  |     elif pipeline_contains_reader: | ||||||
|  |         results = benchmark_reader(querying_pipeline, labels_file) | ||||||
|  | 
 | ||||||
|  |     # Unsupported pipeline type | ||||||
|  |     else: | ||||||
|  |         raise ValueError("Pipeline must be a retriever, reader, or retriever-reader pipeline.") | ||||||
|  | 
 | ||||||
|  |     results["config_file"] = pipeline_config | ||||||
|  |     return results | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | if __name__ == "__main__": | ||||||
|  |     parser = argparse.ArgumentParser() | ||||||
|  |     parser.add_argument("config", type=str, help="Path to pipeline YAML config.") | ||||||
|  |     parser.add_argument("--output", type=str, help="Path to output file.") | ||||||
|  |     args = parser.parse_args() | ||||||
|  | 
 | ||||||
|  |     config_file = Path(args.config) | ||||||
|  |     output_file = f"{config_file.stem}_results.json" if args.output is None else args.output | ||||||
|  | 
 | ||||||
|  |     results = run_benchmark(config_file) | ||||||
|  |     with open(output_file, "w") as f: | ||||||
|  |         json.dump(results, f, indent=2) | ||||||
|  | |||||||
| @ -158,3 +158,20 @@ def get_retriever_config(pipeline: Pipeline) -> Tuple[str, Union[int, str]]: | |||||||
|     retriever_top_k = retriever.top_k |     retriever_top_k = retriever.top_k | ||||||
| 
 | 
 | ||||||
|     return retriever_type, retriever_top_k |     return retriever_type, retriever_top_k | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | def contains_reader(pipeline: Pipeline) -> bool: | ||||||
|  |     """ | ||||||
|  |     Check if a pipeline contains a Reader component. | ||||||
|  |     :param pipeline: Pipeline | ||||||
|  |     """ | ||||||
|  |     components = [comp for comp in pipeline.components.values()] | ||||||
|  |     return any(isinstance(comp, BaseReader) for comp in components) | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | def contains_retriever(pipeline: Pipeline) -> bool: | ||||||
|  |     """ | ||||||
|  |     Check if a pipeline contains a Retriever component. | ||||||
|  |     """ | ||||||
|  |     components = [comp for comp in pipeline.components.values()] | ||||||
|  |     return any(isinstance(comp, BaseRetriever) for comp in components) | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user
	 bogdankostic
						bogdankostic