60 Commits

Author SHA1 Message Date
Christian Clauss
91ab90a256
perf: Python performance improvements with ruff C4 and PERF fixes (#5803)
* Python performance improvements with ruff C4 and PERF

* pre-commit fixes

* Revert changes to examples/basic_qa_pipeline.py

* Revert changes to haystack/preview/testing/document_store.py

* revert releasenotes

* Upgrade to ruff v0.0.290
2023-09-16 16:26:07 +02:00
Christian Clauss
1bc03ddc73
ci: Fix all ruff pyflakes errors except unused imports (#5820)
* ci: Fix all ruff pyflakes errors except unused imports

* Delete releasenotes/notes/fix-some-pyflakes-errors-69a1106efa5d0203.yaml
2023-09-15 18:30:33 +02:00
bogdankostic
ee2745bad8
ci: Add Github workflow to automate benchmark runs (#5399)
* Add config files

* log benchmarks to stdout

* Add top-k and batch size to configs

* Add batch size to configs

* fix: don't download files if they already exist

* Add batch size to configs

* refine script

* Remove configs using 1m docs

* update run script

* update run script

* update run script

* datadog integration

* remove out folder

* gitignore benchmarks output

* test: send benchmarks to datadog

* remove uncommented lines in script

* feat: take branch/tag argument for benchmark setup script

* fix: run.sh should ignore errors

* Add GH workflow to run benchmarks periodically

* Remove unused script

* Adapt cml.yml

* Adapt cml.yml

* Rename cml.yml to benchmarks.yml

* Revert "Rename cml.yml to benchmarks.yml"

This reverts commit 897299433a71a55827124728adff5de918d46d21.

* remove benchmarks.yml

* Use same file extension for all config files

* Use checkout@v3

* Run benchmarks sequentially

* Add timeout-minutes parameter

* Remove changes unrelated to datadog

* Apply black

* use haystack-oss aws account

* Update test/benchmarks/utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* PR feedback

* fix aws credentials step

* Fix path

* check docker

* Allow spinning up containers from within container

* Allow spinning up containers from within container

* Separate launching doc stores from benchmarks

* Remove docker related commands

* run only retrievers

* change port

* Revert "change port"

This reverts commit 6e5bcebb1d16e03ba7672be7e8a089084c7fc3a7.

* Run opensearch benchmark only

* Run weaviate benchmark only

* Run bm25 benchmarks only

* Changes host of doc stores

* add step to get docker logs

* Revert "add step to get docker logs"

This reverts commit c10e6faa76bde5df406a027203bd775d18c93c90.

* Install docker

* Launch doc store containers from wtihin runner container

* Remove kill command

* Change host

* dump docker logs

* change port

* Add cloud startup script

* dump docker logs

* add network param

* add network to startup.sh

* check cluster health

* move steps

* change port

* try using services

* check cluster health

* use services

* run only weaviate

* change host

* Upload benchmark results as artifacts

* Update configs

* Delete index after benchmark run

* Use correct index name

* Run only failing config

* Use smaller batch size

* Increase memory for opensearch

* Reduce batch size further

* Provide more storage

* Reduce batch size

* dump docker logs

* add java opts

* Spin up only opensearch container

* Create separate job for each doc store

* Run benchmarks sequentially

* Set working directory

* Account for reader benchmarks not doing indexing

* Change key of reader metrics

* Apply PR feedback

* Remove whitespace

* Adapt workflow to changes in datadog scripts

* Adapt workflow to changes in datadog scripts

* Increase memory for opensearch

* Reduce batch size

* Add preprocessing_batch_size to Readers

* Remove unrelated change

* Move order

* Fix path

* Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

* Manually terminate EC2 instance

* Manually terminate EC2 instance

* Always terminate runner

* Always terminate runner

* Remove unnecessary terminate-runner job

* Add cron schedule

* Disable telemetry

* Rename cml.yml to benchmarks.yml

---------

Co-authored-by: rjanjua <rohan.janjua@gmail.com>
Co-authored-by: Paul Steppacher <p.steppacher91@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-17 12:56:45 +02:00
bogdankostic
c26f1e9426
fix: Use correct type for points in datadog (benchmarks) (#5570) 2023-08-14 17:40:36 +02:00
Massimiliano Pippi
ac4e762422
Fix datadog client init (#5524) 2023-08-07 12:18:46 +02:00
bogdankostic
56cea8cbbd
test: Add scripts to send benchmark results to datadog (#5432)
* Add config files

* log benchmarks to stdout

* Add top-k and batch size to configs

* Add batch size to configs

* fix: don't download files if they already exist

* Add batch size to configs

* refine script

* Remove configs using 1m docs

* update run script

* update run script

* update run script

* datadog integration

* remove out folder

* gitignore benchmarks output

* test: send benchmarks to datadog

* remove uncommented lines in script

* feat: take branch/tag argument for benchmark setup script

* fix: run.sh should ignore errors

* Remove changes unrelated to datadog

* Apply black

* Update test/benchmarks/utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* PR feedback

* Account for reader benchmarks not doing indexing

* Change key of reader metrics

* Apply PR feedback

* Remove whitespace

---------

Co-authored-by: rjanjua <rohan.janjua@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-03 10:09:00 +02:00
bogdankostic
fd25106c88
test: Adapt batch size in retriever-reader benchmarks (#5281) 2023-07-06 10:42:34 +02:00
bogdankostic
7731713a1e
test: Add benchmark config files (#5093)
* Add config files

* Add top-k and batch size to configs

* Add batch size to configs

* Add batch size to configs

* Remove configs using 1m docs
2023-06-14 18:15:50 +02:00
bogdankostic
c3e59914da
refactor: Delete outdated benchmark files (#5008) 2023-06-01 13:59:12 +02:00
bogdankostic
6774e0ae58
fix: Use queries from aggregated labels in benchmarks (#5054)
* Include benchmark config in output

* Use queries from aggregated labels
2023-06-01 10:49:54 +02:00
bogdankostic
b8ff1052d4
refactor: Adapt running benchmarks (#5007)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Adapt running benchmarks script

* Adapt README.md

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* minor readme update

* Create separate methods for checking if pipeline contains reader or retriever

* Fix reader pipeline case

---------

Co-authored-by: Darja Fokina <daria.f93@gmail.com>
2023-05-26 18:48:11 +02:00
bogdankostic
5633446173
refactor: Add reader-retriever benchmark script (#5006)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Adapt running reader benchmarks

* Adapt retriever reader benchmark script

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory

* Remove unused line

* Create separate method for getting reader config

* Make use of get_reader_config

* Create separate method for retriever config
2023-05-26 13:54:52 +02:00
bogdankostic
796340e788
refactor: Adapt reader benchmarks (#5005) 2023-05-26 11:40:35 +02:00
bogdankostic
6e10fdab27
refactor: Adapt retriever benchmarks script (#5004)
* Generate eval result in separate method

* Adapt benchmarking utils

* Adapt running retriever benchmarks

* Adapt error message

* Raise error if file doesn't exist

* Raise error if path doesn't exist or is a directory
2023-05-25 15:39:02 +02:00
bogdankostic
c5f0f820cf
refactor: Adapt benchmarking utils (#5003)
* Adapt benchmarking utils

* Adapt error message

* Adapt doc store launcher registry

* Revert "Adapt doc store launcher registry"

This reverts commit e034936363dde760d393fe00cac998a54a0f5152.
2023-05-25 11:19:46 +02:00
Massimiliano Pippi
4974bf7ab3
chore: remove deprecated MilvusDocumentStore (#4951)
* remove deprecated MilvusDocumentStore

* remove leftovers

* fix pylint
2023-05-19 16:37:38 +02:00
ZanSara
13c4ff1b52
refactor: remove direct logging without a logger (#4253)
* remove direct logging without a logger

* add custom pylint checker

* add test

* pylint

* improve checker message

* mypy

* remove test

* add checker for basicConfig

* more logging missed

* ignore basicConfig

* move out logger

* move out statement

* remove logging configuration
2023-02-23 20:42:42 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
ZanSara
3ffdb0a9a3
chore: fix all EOF (#3852)
* fix all eof

* fix test

* fix test

* fix test

* typo

* fix sample

* fix sample

* add logs

* fix page_dynamic_result.txt
2023-01-16 12:34:50 +01:00
Massimiliano Pippi
6f9a0f2215
use 9200 as the default port in launch_opensearch (#3630) 2022-11-28 19:06:45 +05:30
Malte Pietsch
fb02b61e90
Update README.md (#3247) 2022-10-11 10:43:17 +02:00
Malte Pietsch
7e79a48540
bug: reactivate benchmarks with quick fixes (#2766)
* quick fix benchmark runs to make them work with current haystack version

* fix minor typo

* update readme. fix minor things to make benchmarks run again

* Update Documentation & Code Style

* fix typo in readme

* update result files for reader and retriever querying

* reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs)

* change default memory allocation back to normal. add note to readme

* add first indexing results

* add memory to docker cmd

* full benchmarks results on commit  c5a2651fcbbeffca06ffa9036b10e62669bcc1b0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-09-20 10:22:08 +02:00
Sara Zan
dcb132ba59
chore: remove f-strings from logs for performance reasons (#3212)
* Use the %s syntax on all debug messages

* Use the %s syntax on some more debug messages

* Use the %s syntax on info messages

* Use the %s syntax on warning messages

* Use the %s syntax on error and exception messages

* mypy

* pylint

* trogger tutorials execution in CI

* trigger tutorials execution on CI

* black

* remove embeddings from repr

* fix Document `__repr__`

* address feedback

* mypy
2022-09-19 18:18:32 +02:00
Sara Zan
e92ea4fccb
refactor: rename master into main in documentation and links (#3063)
* master->main

* revert master rename

* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
Tuana Celik
d49e92e21c
ElasticsearchRetriever to BM25Retriever (#2423)
* change class names to bm25

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update Documentation & Code Style

* Add back all_terms_must_match

* fix syntax

* Update Documentation & Code Style

* Update Documentation & Code Style

* Creating a wrapper for old ES retriever with deprecated wrapper

* Update Documentation & Code Style

* New method for deprecating old ESRetriever

* New attempt for deprecating the ESRetriever

* Reverting to the simplest solution - warning logged

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-04-26 16:09:39 +02:00
Sara Zan
a59bca3661
Apply black formatting (#2115)
* Testing black on ui/

* Applying black on docstores

* Add latest docstring and tutorial changes

* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too

* Remove comments

* Relax constraints on pydoc-markdown

* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade

* Fix a couple of bugs

* Add a type: ignore that was missing somehow

* Give path to black

* Apply Black

* Apply Black

* Relocate a couple of type: ignore

* Update documentation

* Make Linux CI run after applying Black

* Triggering Black

* Apply Black

* Remove dependency, does not work well

* Remove manually double trailing commas

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 13:43:18 +01:00
MichelBartels
84147edcca
Model Distillation (#1758)
* initial commit

* Add latest docstring and tutorial changes

* added comments and fixed bug

* fixed bugs, added benchmark and added documentation

* Add latest docstring and tutorial changes

* fix type: ignore comment

* fix logging in benchmark

* fixed distillation config

* Add latest docstring and tutorial changes

* added type annotations

* fixed distillation loss calculation

* added type annotations

* fixed distillation mse loss

* improved model distillation benchmark config loading

* added temperature for model distillation

* removed uncessary imports, added comments, added named parameter calls

* Add latest docstring and tutorial changes

* added some more comments

* added distillation test

* fixed distillation test

* removed unnecessary import

* fix softmax dimension

* add grid search

* improved model distillation benchmark config

* fixed model distillation hyperparameter search

* added doc strings and type hints for model distillation

* Add latest docstring and tutorial changes

* fixed type hints

* fixed type hints

* fixed type hints

* wrote out params instead of kwargs in DistillationDataSilo initializer

* fixed type hints

* fixed typo

* fixed typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-26 18:49:30 +01:00
Sara Zan
eab475bb5d
Rename every occurrence of 'embed_passages' with 'embed_documents' (#1667)
* Rename every occurrence of 'embed_passages' with 'embed_documents'

* Remove aliased method embed_documents

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-28 12:17:56 +02:00
Sara Zan
13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
Lalit Pagaria
5dbd899a93
Experimental changes to support Milvus 2.x (#1473)
* Experimental changes to support Milvus 2.x

* Milvus 2.0 need other containers hence adding them

* Add latest docstring and tutorial changes

* Fixing tests

* Correcting use of list collections

* correcting connection close

* Removing connection close logic

* removing flush

* using collection instead of connection

* fixing describe collection

* Fixing insert, query and search based on new signature

* Making mypy happy

* Fixing one test case

* Fixing search and embedding fetch based on newer api

* Implementing delete vector id function

* Wrapping up final changes

* Add latest docstring and tutorial changes

* Correcting requirements.txt

* removing empty line in requirements.txt

* add docstring and exception for delete

* add docstring. condition import on env var. raise exception for deletion

* fix typo

* change delete signature

* ignore typing for import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-25 10:39:48 +02:00
Malte Pietsch
3a7d029fdd
Fix Opensearch field type (flattened -> nested) (#1609)
* fix field type flattened -> nested. change default port from 9201 to 9200

* change port in benchmarks
2021-10-19 14:40:53 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document, Answer, Label (#1398)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* renam label.question -> label.query. Allow sorting of Answers.

* WIP primitives

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* keep meta data of answers in reader.run()

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* make document=None for open domain labels

* add import

* fix print utils

* fix rest api

* adress review feedback

* Add latest docstring and tutorial changes

* fix mypy

Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Julian Risch
f9d2f786ca
Replace FARM import statements; add dependencies (#1492)
* Replace FARM import statements; add dependencies

* Add InferenceProc., TextCl.Proc., TextPairCl.Proc.

* Remove FARMRanker, add type annotations, rename max_sample

* Add sample_to_features_text for InferenceProc.

* Fix type annotations: model_name_or_path is str not Path

* Fix mypy errors: implement _create_dataset in TextCl.Proc.

* Add task_type "embeddings" in Inferencer

* Allow loading AdaptiveModel for embedding task

* Add SQuAD eval metrics; enable InferenceProc for embedding task

* Add baskets as param to log_samples and handle empty basket list in log_samples

* Remove unused dependencies

* Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead

* Remove FARMRanker and Classifier from doc generation scripts

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-28 16:34:24 +02:00
MichelBartels
da2e8da561
Adding multi gpu support for DPR inference (#1414)
* Added support for Multi-GPU inference to DPR including benchmark

* fixed multi gpu

* added batch size to benchmark to better reflect multi gpu capabilities

* remove unnecessary entry in config.json

* fixed typos

* fixed config name

* update benchmark to use DEVICES constant

* changed multi gpu parameters and updated docstring

* adds silent fallback on cpu

* update doc string, warning and config

Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-10 13:25:02 +02:00
Branden Chan
363be65a78
Implement OpenSearch ANN (#1225)
* Simplify ODES init

* Add arguments to ES init and create script

* Rename similarity_fn_name and add util fn

* Create OpenSearchDocumentStore

* Specify params of Open Search HNSW

* Add better argument handling

* Update opensearch index mapping

* Edit opensearch default port

* Fix HNSW mapping

* Force small HNSW params

* Implement auto start and stopping of document store services

* Fix starting and stopping of ds service

* Restore HNSW params

* Add opensearch query benchmarks

* Add write wait time

* Revert wait time

* Add timeout

* Update benchmarks

* Update benchmarks

* Update benchmarks json

* Update documentation

* Update documentation

* Fix similarity name

* Improve argument passing

* Improve stopping and starting of service
2021-07-26 10:52:52 +02:00
Branden Chan
c513865566
Add L2 support for FAISS HNSW (#1138) 2021-06-04 11:05:18 +02:00
Branden Chan
09ba75073c
Improve Milvus HNSW Performance (#1127)
* Add simplified script

* Optimize HNSW index creation

* Adjust benchmark order

* Rename script
2021-06-02 13:17:35 +02:00
Branden Chan
9356f637d4
Update Milvus benchmarks (#1128)
* Update Milvus benchmarks

* Add sentence transformers

* Update sentence transformers index results

* Remove duplicate row
2021-06-02 13:09:45 +02:00
Branden Chan
77d4c2ca1c
Benchmark milvus (#850)
* Add milvus benchmarking support

* Add latest docstring and tutorial changes

* Edit config

* Disable docker interactive mode

* Add milvus index type support

* Adjust FAISS and Milvus node branching

* Remove duplicate in config

* Revert method for speedup

* Add latest docstring and tutorial changes

* Add latest benchmark run

* Add latest docstring and tutorial changes

* Add json files

* Revert "Add latest docstring and tutorial changes"

This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923.

* Add latest docstring and tutorial changes

* Revert "Add latest docstring and tutorial changes"

This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b.

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-13 14:54:15 +02:00
Markus Paff
b87daed62b
fixed link to dpr (#962) 2021-04-13 09:45:04 +02:00
Timo Moeller
837dea4e6d
Integrate sentence transformers into benchmarks (#843)
* Integrate sentence transformers into benchmarks

* Add doc store asserts

* switch data downloads from s3 client to https. add license info

* Fix mypy, revert config

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-09 17:24:16 +02:00
Branden Chan
f3a3b73d9b
Choose correct similarity fns during benchmark runs & re-run benchmarks (#773)
* Adapt to new dataset_from_dicts return signature

* rename fn

* Align similarity fn in benchmark doc store

* Better choice of similarity fn

* Increase postgres wait time

* Add more expected returned variables

* update benchmark results

* Fix typo

* update all benchmark runs

* multiply stats by 100

* Specify similarity fns for website

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-02-03 11:45:18 +01:00
brandenchan
5665d55ab4 Remove duplicate file 2021-02-01 15:43:53 +01:00
Pavel Soriano
16b8291091
SQuAD to DPR dataset converter (#765)
* Create squad_to_dpr.py

First commit of the squad2dpr script.

* adding review corrections/improvements

* Merge master 5bf351e

* Move script, add docstring

* Add type hints

Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-02-01 15:40:43 +01:00
Malte Pietsch
149d98a0fd
Add latest benchmark run (#652)
* add latest benchmark run

* update templates and fix small json errors

* Change scale

Co-authored-by: brandenchan <brandenchan@icloud.com>
2020-12-10 16:25:51 +01:00
Malte Pietsch
216787ed34
Fix benchmarks (#648)
* disable fasttokenizer, increase ES timeout for delete requests

* add session.close()

* fix deletion of docs
2020-12-02 16:59:42 +01:00
Malte Pietsch
0acafc403a
Automate benchmarks via CML (#518)
* initial test cml

* Update cml.yaml

* WIP test workflow

* switch to general ubuntu ami

* switch to general ubuntu ami

* disable gpu for tests

* rm gpu infos

* rm gpu infos

* update token env

* switch github token

* add postgres

* test db connection

* fix typo

* remove tty

* add sleep for db

* debug runner

* debug removal postgres

* debug: reset to working commit

* debug: change github token

* switch to new bot token

* debug token

* add back postgres

* adjust network runner docker

* add elastic

* fix typo

* adjust working dir

* fix benchmark execution

* enable s3 downloads

* add query benchmark. fix path

* add saving of markdown files

* cat md files. add faiss+dpr. increase n_queries

* switch to GPU instance

* switch availability zone

* switch to public aws DL ami

* increase volume size

* rm faiss. fix error logging

* save markdown files

* add reader benchmarks

* add download of squad data

* correct reader metric normalization

* fix newlines between reports

* fix max_docs for reader eval data. remove max_docs from ci run config

* fix mypy. switch workflow trigger

* try trigger for label

* try trigger for label

* change trigger syntax

* debug machine shutdown with test workflow

* add es and postgres to test workflow

* Revert "add es and postgres to test workflow"

This reverts commit 6f038d3d7f12eea924b54529e61b192858eaa9d5.

* Revert "debug machine shutdown with test workflow"

This reverts commit db70eabae8850b88e1d61fd79b04d4f49d54990a.

* fix typo in action. set benchmark config back to original
2020-11-18 18:28:17 +01:00
Branden Chan
99e924aede
Update Documentation for Haystack 0.5.0 (#557)
* Add languages and preprocessing pages

* add content

* address review comments

* make link relative

* update api ref with latest docstrings

* move doc readme and update

* add generator API docs

* fix example code

* design and link fix

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2020-11-06 10:53:22 +01:00
Branden Chan
7a9f32f264 Fix template 2020-10-29 10:30:03 +01:00
Branden Chan
7c81dfdc3a Address reviewer comments 2020-10-27 12:41:11 +01:00