3803 Commits

Author SHA1 Message Date
Massimiliano Pippi
b07fcb7185
feat: add a security policy for Haystack (#3130)
* add the security policy

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* include review feedback

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-09-02 12:00:14 +02:00
Branden Chan
d4722c2ec5
Document FARMReader.train() evaluation report log level (#3129)
* Mention evaluation report logging level

* Mention evaluation report logging level
2022-09-01 10:58:47 +02:00
Vladimir Blagojevic
356537c883
Standardize devices parameter and device initialization (#3062)
* Use devices parameter and initialize devices consistently
2022-08-31 15:30:31 +02:00
Massimiliano Pippi
ffee36c694
pin pydantic to 1.9.2 (#3126) 2022-08-31 14:36:40 +02:00
Vladimir Blagojevic
66f3f42a46
fix: Replace multiprocessing tokenization with batched fast tokenization (#3089)
* Replace multiprocessing tokenization with batched fast tokenization

* Replace deprecated tokenization method invocations
2022-08-31 07:33:39 -04:00
Stefano Fiorucci
e7771dc18e
bug: adapt UI random question for streamlit 1.12 and pin to streamlit>=1.9.0 (#3121)
* adapt for streamlit 1.12.0 and pin to streamlit>=1.9.0

* make pylint happy
2022-08-31 12:35:40 +02:00
Fernando Pereira
911a2fa7e4
feat: Add warnings to PineconeDocumentStore about indexing metadata if filters return no documents (#3086)
* black-jupyter format changes

* fix merge

* filters and documents/ids list evaluations fix (for this specific warning context)
2022-08-30 17:02:07 +02:00
Julian Risch
f010a17f04
increase version to next release candidate (#3115) 2022-08-29 17:05:44 +02:00
Vladimir Blagojevic
99efab7928
Bump transformers to v4.21.2 (#3098) 2022-08-29 11:02:13 -04:00
Sara Zan
e88f1e2577
Add custom_mapping to the list of fields that can contain string-encoded JSON (#3065) 2022-08-29 11:10:24 +02:00
Julian Risch
4e518cdddd
chore: increase version for 1.8 release (#3109)
* increase version for 1.8 release

* ignore missing-timeout for pylint
v1.8.0
2022-08-26 15:00:14 +02:00
Julian Risch
3e3ff33cdd
feat: add batch evaluation method for pipelines (#2942)
* add basic pipeline.eval_batch for qa without filters

* black formatting

* pydoc-markdown

* remove batch eval tests failing due to bugs

* remove comment

* explain commented out tests

* avoid code duplication

* black

* mypy

* pydoc markdown

* add batch option to execute_eval_run

* pydoc markdown

* Apply documentation suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply documentation suggestion from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* add documentation based on review comments

* black

* black

* schema updates

* remove duplicate tests

* add separate method for column reordering

* merge _build_eval_dataframe methods

* pylint ignore in function

* change type annotation of queries to list only

* one-liner addressing review comment on params dict

* markdown files updated

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-25 17:50:57 +02:00
bogdankostic
e2ec0d1c15
feat: FAISS in OpenSearch: check existing index (#3101)
* Add check for mapping for existing indices

* Add test

* Check if "method" field exists
2022-08-25 17:33:26 +02:00
Julian Risch
cc9d39c360
increase version to next release candidate (#3100) 2022-08-25 15:55:34 +02:00
Julian Risch
0950db5032
chore: increase version to 1.7.2 for patch release (#3097)
* schema update

* schema update audio nodes

* schema update audio param type
v1.7.2
2022-08-25 13:55:28 +02:00
Sebastian
0cf0568dd0
fix: Use use_auth_token in all cases when loading from the HF Hub (#3094)
* Making sure to pass on use_auth_token to all from_pretrained calls
2022-08-25 10:30:03 +02:00
Sara Zan
e92ea4fccb
refactor: rename master into main in documentation and links (#3063)
* master->main

* revert master rename

* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
tstadel
92046ce5b5
feat: FAISS in OpenSearch: Support HNSW for dot product and l2 (#3029)
* support faiss hnsw

* blacken

* update docs

* improve similarity check

* add tests

* update schema

* set ef_search param correctly

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* regenerate docs

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 16:43:48 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db (#2749)
* update to PineconeDocumentStore to remove dependency on SQL db

* Update Documentation & Code Style

* typing fixes

* Update Documentation & Code Style

* fixed embedding generator to yield Documents

* Update Documentation & Code Style

* fixes for final typing issues

* fixes for pylint

* Update Documentation & Code Style

* uncomment pinecone tests

* added new params to docstrings

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* changes based on comments, updated errors and install

* Update Documentation & Code Style

* mypy

* implement simple filtering in pinecone mock

* typo

* typo in reverse

* account for missing meta key in filtering

* typo

* added metadata filtering to describe index

* added handling for users switching indexes in same doc store, and handling duplicate docs in write

* syntax tweaks

* added index option to document/embedding count calls

* labels implementation in progress

* added metadata fields to be indexed for pinecone tests

* further changes to mock

* WIP implementation of labels+multilabels

* switched to rely on labels namespace rather than filter

* simpler delete_labels

* label fixes, remove debug code

* Apply dostring fixes

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* mypy

* pylint

* docs

* temporarily un-mock Pinecone

* Small Pinecone test suite

* pylint

* Add fake test key to pass the None check

* Add again fake test key to pass the None check

* Add Pinecone to default docstores and fix filters

* Fix field name

* Change field name

* Change field value

* Remove comments

* forgot to upgrade pyproject.toml

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
Stefano Fiorucci
891707ecaa
bug: handle Optional params in schema validation (#2980)
* not working draft

* first draft

* fix

* revert json schema

* better schema

* improvements, support different python versions

* little simplification

* improvements and more tests

* Revert "Merge branch 'handle_optional_params' into origin/main"

This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34.

* fix git mess

* handle optional params; schema

* test null values

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-24 10:40:19 +02:00
Ofek Lev
f6a4a14790
refactor: update package metadata (#3079)
* Update package metadata

* fix yaml

* remove Python version cap

* address review
2022-08-24 09:46:21 +02:00
Branden Chan
6d4031d8f6
Add OpenAI Answer Generator API (#3050)
* Add OpenAI Answer Generator API

* Regen tutorials

* Regen md files

* Incorporate reviewer feedback

* Incorporate reviewer feedback

* Incorporate reviewer feedback

* Incorporate reviewer feedback
2022-08-24 09:20:08 +02:00
Malte Pietsch
76af0444cc
feat: add progressbar to upload_files() of deepset Cloud client (#3069) 2022-08-23 20:51:08 +02:00
Sebastian
3ea57801ae
feat: Early stopping can be used in Reader and Retriever training (#3071)
* Add option to set early stopping in training

* Moved EarlyStopping to haystack/utils/early_stopping.py and added EarlyStopping to training Dense retrievers.
2022-08-23 14:18:12 +02:00
bogdankostic
b03de53716
Use random_sample instead of ndarray for random array (#3083) 2022-08-22 13:19:45 +02:00
Daniel Bichuetti
149224fe3a
fix: Crawler quits ChromeDriver on destruction (#3070)
* Close Chrome and Selenium WebDriver on destruction

* Fix failed pre-commit hook
2022-08-22 13:08:16 +02:00
Daniel Bichuetti
d715d0202d
fix: update ChromeDriver options on restricted environments and add ChromeDriver options as function parameter (#3043)
* Fix when env does nto exist

* Fix missed line

* Set conservative chromedriver options

* Set default options based on environment

* Fix removed line

* Updated documentation

* Generate new schemas manually

* Add arguments via iterator and helper function

* Pre-push doc format

* Use imported Option vs full namespace access

* Manually update schema

* Manually add documentation and schema

* Fix language and documentation

* Fix typo

* Auto generated docs

* Updated documentation
2022-08-22 12:59:33 +02:00
David G
e715dee17d
docs:fixed typo (or old documentation) in ipynb tutorial 3 (#3033)
* Update Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb

Just fixed the key in the document dictionary format so `write_documents()` won't raise an error. By the way the `write_documents()` error is really explicative

* Run convert_notebooks_into_webpages.py

Co-authored-by: David Gervasoni <david.gervasoni@trix.ai>
2022-08-22 12:56:30 +02:00
Massimiliano Pippi
97a8d30512
feat: Allow exact list matching with field in Elasticsearch filtering (#2988)
* ES filtering - allow exact list matching with field

typing fix

Update Documentation & Code Style

remove default hit limit in filtering queries

Update Documentation & Code Style

pytest es list eq filter

Update Documentation & Code Style

* review feedback

* fixed test

Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2022-08-22 12:42:37 +02:00
Daniel Bichuetti
d5e36ce6b4
fix(translator): write translated text to output documents, while keeping input untouched (#3077)
* Set translated text on a copy of original document

* Return new translated list

* Manually generated docs

TODO: check pre-commit

* Hook generated file

* Rename variables for better maintenance

* fix(translator): prevent inputs from being changed

* fix: manual update translator docs

* style(translator): explicit type declaration on List

* docs(translator): re-run pre-commit hook

* style(translator): ignore mypy wrong type check

* docs(translator): re-run pre-commit hook
2022-08-22 04:07:05 -04:00
Julian Risch
bc6f71b5ba
chore: increase version to next release candidate (#3067)
* increase version to next release candidate

* generate schema files
2022-08-19 14:49:50 +02:00
Julian Risch
eb0f0da0fd
Prepare 1.7.1 release (#3061)
* prepare 1.7.1 release

* Fix schemas

* Update haystack/json-schemas/haystack-pipeline-1.7.1.schema.json

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* change back main to master

* remove newline at end of file

* generate schema file with no newline

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
v1.7.1
2022-08-19 13:24:40 +02:00
Vladimir Blagojevic
be127e5b61
Trigger build failure Slack notify only on main repo (not forks) (#3039)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-18 06:51:39 -04:00
Massimiliano Pippi
af24ffae55
feat: take the list of models to cache instead of hardcoding one (#3060)
* take the list of models to cache as an input

* let nltk find the cache dir on its own
2022-08-18 11:55:29 +02:00
tstadel
1027ab3624
Bump Version to 1.7.1rc (#3041)
* bump version to 1.7.1rc

* update openapi
2022-08-18 10:31:57 +02:00
James Briggs
82c9cff3d9
test: update filtering of Pinecone mock to imitate doc store (#3020)
* updated filtering of doc store to imitate pinecone

* Update test/mocks/pinecone.py
2022-08-18 09:57:08 +02:00
Sebastian
74b7c2c12a
Pin pyworld to <=0.2.12 (#3047) 2022-08-17 08:11:28 +02:00
Massimiliano Pippi
2328097ce0
rename the default branch name (#3045) 2022-08-16 20:24:58 +02:00
Tuana Celik
2298155a20
changing Slack to Discord (#3040)
* changing Slack to Discord

* Update README.md

* updating contributing
2022-08-15 15:56:16 +03:00
tstadel
baefd32b6f
Upgrade to v1.7.0 and copy docs folder (#3014)
* update version to 1.7.0

* copy docs

* update openapi

* generate schemas

* make update_json_schema() idempotent

* update docs, schema and openapi
v1.7.0
2022-08-15 14:20:30 +02:00
Julian Risch
d61755322f
chore: fix typo in API docs (#3023)
* chore: fix typo in API docs

* fix openapi

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-15 13:25:20 +02:00
tstadel
0aa0c68785
Fix broken MultiLabel serialization (#3037)
* Fix MultiLabel serialization

* update docs

* better comment

* remove unused imports

* remove unused imports (2)
2022-08-15 13:09:18 +02:00
Branden Chan
ff38a20863
docs: update File Classifier Docstring (#3018)
* Update docstring

* Trigger pre-commit hook

* Trigger pre-commit hook

* Incorporate reviewer feedback

* Incorporate reviewer feedback
2022-08-15 12:37:28 +02:00
Branden Chan
7312f99584
Update Summarizer Docs (#3032)
* Change text to content

* Change text to content
2022-08-15 12:35:41 +02:00
bogdankostic
3a849d6c07
bug: Make TranslationWrapperPipeline work with QuestionAnswerGenerationPipeline (#3034)
* Overwrite output_translator's run method with run_batch

* Fix mypy

* Revert change

* Overwrite run method only with QuestionAnswerGenerationPipeline
2022-08-15 10:05:34 +02:00
Malte Pietsch
1b422ab657
feat: Enable isolated node eval for answer generator nodes (incl. OpenAI Node) (#3036)
* enable isolated node eval for answer generator nodes

* adjust comment

* remove unused import

* fix mypy

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-08-14 12:11:23 +02:00
Stefano Fiorucci
4f261a4575
docs: extend tutorial14 about query classification (#3013)
* first draft for tutorial extension

* forgotten markdown

* improved tutorial

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* add markdown

* first draft for tutorial extension

* forgotten markdown

* improved tutorial

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* add markdown

* little corrections

* little corrections and add py tutorial

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* update tutorial webpage

* fix typo

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-12 17:59:47 +02:00
Igor Tarlinskiy
5b06658670
Forbid the key id from Documents to be written in WeaviateDocumentStore (#2846)
* Raise error upon duplicate document key found within meta info

* value error msg fix

* Update Documentation & Code Style

* Raise exception instead of asserting

* Update Documentation & Code Style

* add test
2022-08-12 17:50:54 +02:00
Dmitry Goryunov
da7836a931
feat: Support embedding dimensions on DeepsetCloudDocumentStore (#2995)
* Add embedding_dim to dc store

* Remove similarity from query params, it is not used

* Remove unused `return_embedding` parameter

* Remove unused param

* Update the documentation

* Update schemas

* Revert openapi changes

* Revert openapi changes

* Fix openapi

* Fix json schema

* Improve docstrings

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Improve logs

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update the docs

* Fix similarity

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:46:52 +02:00
tstadel
c0fbe45c02
feat: Add delete_all_files() to FileClient (#3025)
* add delete_all_files()

* rename `file` to `files`

* Update haystack/utils/deepsetcloud.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/utils/deepsetcloud.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/utils/deepsetcloud.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* streamline "If set to None" and "to the API call"

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:20:30 +02:00