mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-07-27 02:40:41 +00:00

* add time and perf benchmark for es * Add retriever benchmarking * Add Reader benchmarking * add nq to squad conversion * add conversion stats * clean benchmarks * Add link to dataset * Update imports * add first support for neg psgs * Refactor test * set max_seq_len * cleanup benchmark * begin retriever speed benchmarking * Add support for retriever query index benchmarking * improve reader eval, retriever speed benchmarking * improve retriever speed benchmarking * Add retriever accuracy benchmark * Add neg doc shuffling * Add top_n * 3x speedup of SQL. add postgres docker run. make shuffle neg a param. add more logging * Add models to sweep * add option for faiss index type * remove unneeded line * change faiss to faiss_flat * begin automatic benchmark script * remove existing postgres docker for benchmarking * Add data processing scripts * Remove shuffle in script bc data already shuffled * switch hnsw setup from 256 to 128 * change es similarity to dot product by default * Error includes stack trace * Change ES default timeout * remove delete_docs() from timing for indexing * Add support for website export * update website on push to benchmarks * add complete benchmarks results * new json format * removed NaN as is not a valid json token * fix benchmarking for faiss hnsw queries. do sql calls in update_embeddings() as batches * update benchmarks for hnsw 128,20,80 * don't delete full index in delete_all_documents() * update texts for charts * update recall column for retriever * change scale and add units to desc * add units to legend * add axis titles. update desc * add html tags Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
21 lines
443 B
Python
21 lines
443 B
Python
import json
|
|
from tqdm import tqdm
|
|
import time
|
|
import random
|
|
random.seed(42)
|
|
|
|
lines = []
|
|
with open("psgs_w100_minus_gold_unshuffled.tsv") as f:
|
|
f.readline() # Remove column header
|
|
lines = [l for l in tqdm(f)]
|
|
|
|
tic = time.perf_counter()
|
|
random.shuffle(lines)
|
|
toc = time.perf_counter()
|
|
t = toc - tic
|
|
print(t)
|
|
with open("psgs_w100_minus_gold.tsv", "w") as f:
|
|
f.write("id\ttext\title\n")
|
|
for l in tqdm(lines):
|
|
f.write(l)
|