Fernando Pereira
911a2fa7e4
feat: Add warnings to PineconeDocumentStore about indexing metadata if filters return no documents ( #3086 )
...
* black-jupyter format changes
* fix merge
* filters and documents/ids list evaluations fix (for this specific warning context)
2022-08-30 17:02:07 +02:00
Julian Risch
f010a17f04
increase version to next release candidate ( #3115 )
2022-08-29 17:05:44 +02:00
Vladimir Blagojevic
99efab7928
Bump transformers to v4.21.2 ( #3098 )
2022-08-29 11:02:13 -04:00
Sara Zan
e88f1e2577
Add custom_mapping to the list of fields that can contain string-encoded JSON ( #3065 )
2022-08-29 11:10:24 +02:00
Julian Risch
4e518cdddd
chore: increase version for 1.8 release ( #3109 )
...
* increase version for 1.8 release
* ignore missing-timeout for pylint
v1.8.0
2022-08-26 15:00:14 +02:00
Julian Risch
3e3ff33cdd
feat: add batch evaluation method for pipelines ( #2942 )
...
* add basic pipeline.eval_batch for qa without filters
* black formatting
* pydoc-markdown
* remove batch eval tests failing due to bugs
* remove comment
* explain commented out tests
* avoid code duplication
* black
* mypy
* pydoc markdown
* add batch option to execute_eval_run
* pydoc markdown
* Apply documentation suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply documentation suggestion from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* add documentation based on review comments
* black
* black
* schema updates
* remove duplicate tests
* add separate method for column reordering
* merge _build_eval_dataframe methods
* pylint ignore in function
* change type annotation of queries to list only
* one-liner addressing review comment on params dict
* markdown files updated
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-25 17:50:57 +02:00
bogdankostic
e2ec0d1c15
feat: FAISS in OpenSearch: check existing index ( #3101 )
...
* Add check for mapping for existing indices
* Add test
* Check if "method" field exists
2022-08-25 17:33:26 +02:00
Julian Risch
cc9d39c360
increase version to next release candidate ( #3100 )
2022-08-25 15:55:34 +02:00
Julian Risch
0950db5032
chore: increase version to 1.7.2 for patch release ( #3097 )
...
* schema update
* schema update audio nodes
* schema update audio param type
v1.7.2
2022-08-25 13:55:28 +02:00
Sebastian
0cf0568dd0
fix: Use use_auth_token in all cases when loading from the HF Hub ( #3094 )
...
* Making sure to pass on use_auth_token to all from_pretrained calls
2022-08-25 10:30:03 +02:00
Sara Zan
e92ea4fccb
refactor: rename master
into main
in documentation and links ( #3063 )
...
* master->main
* revert master rename
* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
tstadel
92046ce5b5
feat: FAISS in OpenSearch: Support HNSW for dot product and l2 ( #3029 )
...
* support faiss hnsw
* blacken
* update docs
* improve similarity check
* add tests
* update schema
* set ef_search param correctly
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* regenerate docs
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 16:43:48 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db ( #2749 )
...
* update to PineconeDocumentStore to remove dependency on SQL db
* Update Documentation & Code Style
* typing fixes
* Update Documentation & Code Style
* fixed embedding generator to yield Documents
* Update Documentation & Code Style
* fixes for final typing issues
* fixes for pylint
* Update Documentation & Code Style
* uncomment pinecone tests
* added new params to docstrings
* Update Documentation & Code Style
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* changes based on comments, updated errors and install
* Update Documentation & Code Style
* mypy
* implement simple filtering in pinecone mock
* typo
* typo in reverse
* account for missing meta key in filtering
* typo
* added metadata filtering to describe index
* added handling for users switching indexes in same doc store, and handling duplicate docs in write
* syntax tweaks
* added index option to document/embedding count calls
* labels implementation in progress
* added metadata fields to be indexed for pinecone tests
* further changes to mock
* WIP implementation of labels+multilabels
* switched to rely on labels namespace rather than filter
* simpler delete_labels
* label fixes, remove debug code
* Apply dostring fixes
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* pylint
* docs
* temporarily un-mock Pinecone
* Small Pinecone test suite
* pylint
* Add fake test key to pass the None check
* Add again fake test key to pass the None check
* Add Pinecone to default docstores and fix filters
* Fix field name
* Change field name
* Change field value
* Remove comments
* forgot to upgrade pyproject.toml
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
Stefano Fiorucci
891707ecaa
bug: handle Optional
params in schema validation ( #2980 )
...
* not working draft
* first draft
* fix
* revert json schema
* better schema
* improvements, support different python versions
* little simplification
* improvements and more tests
* Revert "Merge branch 'handle_optional_params' into origin/main"
This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34.
* fix git mess
* handle optional params; schema
* test null values
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-24 10:40:19 +02:00
Ofek Lev
f6a4a14790
refactor: update package metadata ( #3079 )
...
* Update package metadata
* fix yaml
* remove Python version cap
* address review
2022-08-24 09:46:21 +02:00
Branden Chan
6d4031d8f6
Add OpenAI Answer Generator API ( #3050 )
...
* Add OpenAI Answer Generator API
* Regen tutorials
* Regen md files
* Incorporate reviewer feedback
* Incorporate reviewer feedback
* Incorporate reviewer feedback
* Incorporate reviewer feedback
2022-08-24 09:20:08 +02:00
Malte Pietsch
76af0444cc
feat: add progressbar to upload_files() of deepset Cloud client ( #3069 )
2022-08-23 20:51:08 +02:00
Sebastian
3ea57801ae
feat: Early stopping can be used in Reader and Retriever training ( #3071 )
...
* Add option to set early stopping in training
* Moved EarlyStopping to haystack/utils/early_stopping.py and added EarlyStopping to training Dense retrievers.
2022-08-23 14:18:12 +02:00
bogdankostic
b03de53716
Use random_sample
instead of ndarray
for random array ( #3083 )
2022-08-22 13:19:45 +02:00
Daniel Bichuetti
149224fe3a
fix: Crawler quits ChromeDriver on destruction ( #3070 )
...
* Close Chrome and Selenium WebDriver on destruction
* Fix failed pre-commit hook
2022-08-22 13:08:16 +02:00
Daniel Bichuetti
d715d0202d
fix: update ChromeDriver options on restricted environments and add ChromeDriver options as function parameter ( #3043 )
...
* Fix when env does nto exist
* Fix missed line
* Set conservative chromedriver options
* Set default options based on environment
* Fix removed line
* Updated documentation
* Generate new schemas manually
* Add arguments via iterator and helper function
* Pre-push doc format
* Use imported Option vs full namespace access
* Manually update schema
* Manually add documentation and schema
* Fix language and documentation
* Fix typo
* Auto generated docs
* Updated documentation
2022-08-22 12:59:33 +02:00
David G
e715dee17d
docs:fixed typo (or old documentation) in ipynb tutorial 3 ( #3033 )
...
* Update Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb
Just fixed the key in the document dictionary format so `write_documents()` won't raise an error. By the way the `write_documents()` error is really explicative
* Run convert_notebooks_into_webpages.py
Co-authored-by: David Gervasoni <david.gervasoni@trix.ai>
2022-08-22 12:56:30 +02:00
Massimiliano Pippi
97a8d30512
feat: Allow exact list matching with field in Elasticsearch filtering ( #2988 )
...
* ES filtering - allow exact list matching with field
typing fix
Update Documentation & Code Style
remove default hit limit in filtering queries
Update Documentation & Code Style
pytest es list eq filter
Update Documentation & Code Style
* review feedback
* fixed test
Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2022-08-22 12:42:37 +02:00
Daniel Bichuetti
d5e36ce6b4
fix(translator): write translated text to output documents, while keeping input untouched ( #3077 )
...
* Set translated text on a copy of original document
* Return new translated list
* Manually generated docs
TODO: check pre-commit
* Hook generated file
* Rename variables for better maintenance
* fix(translator): prevent inputs from being changed
* fix: manual update translator docs
* style(translator): explicit type declaration on List
* docs(translator): re-run pre-commit hook
* style(translator): ignore mypy wrong type check
* docs(translator): re-run pre-commit hook
2022-08-22 04:07:05 -04:00
Julian Risch
bc6f71b5ba
chore: increase version to next release candidate ( #3067 )
...
* increase version to next release candidate
* generate schema files
2022-08-19 14:49:50 +02:00
Julian Risch
eb0f0da0fd
Prepare 1.7.1 release ( #3061 )
...
* prepare 1.7.1 release
* Fix schemas
* Update haystack/json-schemas/haystack-pipeline-1.7.1.schema.json
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* change back main to master
* remove newline at end of file
* generate schema file with no newline
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
v1.7.1
2022-08-19 13:24:40 +02:00
Vladimir Blagojevic
be127e5b61
Trigger build failure Slack notify only on main repo (not forks) ( #3039 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-18 06:51:39 -04:00
Massimiliano Pippi
af24ffae55
feat: take the list of models to cache instead of hardcoding one ( #3060 )
...
* take the list of models to cache as an input
* let nltk find the cache dir on its own
2022-08-18 11:55:29 +02:00
tstadel
1027ab3624
Bump Version to 1.7.1rc ( #3041 )
...
* bump version to 1.7.1rc
* update openapi
2022-08-18 10:31:57 +02:00
James Briggs
82c9cff3d9
test: update filtering of Pinecone mock to imitate doc store ( #3020 )
...
* updated filtering of doc store to imitate pinecone
* Update test/mocks/pinecone.py
2022-08-18 09:57:08 +02:00
Sebastian
74b7c2c12a
Pin pyworld to <=0.2.12 ( #3047 )
2022-08-17 08:11:28 +02:00
Massimiliano Pippi
2328097ce0
rename the default branch name ( #3045 )
2022-08-16 20:24:58 +02:00
Tuana Celik
2298155a20
changing Slack to Discord ( #3040 )
...
* changing Slack to Discord
* Update README.md
* updating contributing
2022-08-15 15:56:16 +03:00
tstadel
baefd32b6f
Upgrade to v1.7.0 and copy docs folder ( #3014 )
...
* update version to 1.7.0
* copy docs
* update openapi
* generate schemas
* make update_json_schema() idempotent
* update docs, schema and openapi
v1.7.0
2022-08-15 14:20:30 +02:00
Julian Risch
d61755322f
chore: fix typo in API docs ( #3023 )
...
* chore: fix typo in API docs
* fix openapi
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-15 13:25:20 +02:00
tstadel
0aa0c68785
Fix broken MultiLabel
serialization ( #3037 )
...
* Fix MultiLabel serialization
* update docs
* better comment
* remove unused imports
* remove unused imports (2)
2022-08-15 13:09:18 +02:00
Branden Chan
ff38a20863
docs: update File Classifier Docstring ( #3018 )
...
* Update docstring
* Trigger pre-commit hook
* Trigger pre-commit hook
* Incorporate reviewer feedback
* Incorporate reviewer feedback
2022-08-15 12:37:28 +02:00
Branden Chan
7312f99584
Update Summarizer Docs ( #3032 )
...
* Change text to content
* Change text to content
2022-08-15 12:35:41 +02:00
bogdankostic
3a849d6c07
bug: Make TranslationWrapperPipeline
work with QuestionAnswerGenerationPipeline
( #3034 )
...
* Overwrite output_translator's run method with run_batch
* Fix mypy
* Revert change
* Overwrite run method only with QuestionAnswerGenerationPipeline
2022-08-15 10:05:34 +02:00
Malte Pietsch
1b422ab657
feat: Enable isolated node eval for answer generator nodes (incl. OpenAI Node) ( #3036 )
...
* enable isolated node eval for answer generator nodes
* adjust comment
* remove unused import
* fix mypy
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-08-14 12:11:23 +02:00
Stefano Fiorucci
4f261a4575
docs: extend tutorial14 about query classification ( #3013 )
...
* first draft for tutorial extension
* forgotten markdown
* improved tutorial
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* add markdown
* first draft for tutorial extension
* forgotten markdown
* improved tutorial
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* add markdown
* little corrections
* little corrections and add py tutorial
* Update tutorials/Tutorial14_Query_Classifier.ipynb
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update tutorials/Tutorial14_Query_Classifier.ipynb
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update tutorials/Tutorial14_Query_Classifier.ipynb
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update tutorials/Tutorial14_Query_Classifier.ipynb
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* update tutorial webpage
* fix typo
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-12 17:59:47 +02:00
Igor Tarlinskiy
5b06658670
Forbid the key id
from Document
s to be written in WeaviateDocumentStore
( #2846 )
...
* Raise error upon duplicate document key found within meta info
* value error msg fix
* Update Documentation & Code Style
* Raise exception instead of asserting
* Update Documentation & Code Style
* add test
2022-08-12 17:50:54 +02:00
Dmitry Goryunov
da7836a931
feat: Support embedding dimensions on DeepsetCloudDocumentStore ( #2995 )
...
* Add embedding_dim to dc store
* Remove similarity from query params, it is not used
* Remove unused `return_embedding` parameter
* Remove unused param
* Update the documentation
* Update schemas
* Revert openapi changes
* Revert openapi changes
* Fix openapi
* Fix json schema
* Improve docstrings
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Improve logs
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update the docs
* Fix similarity
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:46:52 +02:00
tstadel
c0fbe45c02
feat: Add delete_all_files()
to FileClient
( #3025 )
...
* add delete_all_files()
* rename `file` to `files`
* Update haystack/utils/deepsetcloud.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/utils/deepsetcloud.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/utils/deepsetcloud.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* streamline "If set to None" and "to the API call"
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:20:30 +02:00
tstadel
668fd548a6
Fix embeddings_field_supports_similarity
of OpenSearchDocumentStore
when creating index ( #3030 )
...
* fix embeddings_field_supports_similarity when creating index
* fix test
2022-08-12 11:19:59 +02:00
James Briggs
26c938a8e6
test: add meta fields for meta_config to be used during testing ( #3021 )
...
* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore
* Add documentation on metadata filtering in docstring
* docs
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-12 10:27:56 +02:00
bogdankostic
81a5949103
ci: Increase Weaviate's disk usage + print docker logs ( #3026 )
2022-08-11 18:13:43 +02:00
Sebastian
44e2b1beed
Resolving issue 2853: no answer logic in FARMReader ( #2856 )
...
* Update FARMReader.eval_on_file to be consistent with FARMReader.eval
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-08-11 16:45:03 +02:00
Sara Zan
fc8ecbf20c
Move azure-core
pin into the dev dependency list ( #3022 )
2022-08-11 15:16:43 +02:00
Zoltan Fedor
408d8e6ff5
Enable the JoinDocuments
node to work with documents with score=None
( #2984 )
...
* Enable the `JoinDocuments` node to work with documents with `score=None`
This fixes #2983
As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate.
THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`.
There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: https://github.com/deepset-ai/haystack/pull/2436
This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried.
The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default.
* Fixing test bug
* Addressing PR review comments
- Extending unit tests
- Simplifying logic
* Making the sorting work even with no scores
By making the no score being sorted as -Inf
* Forgot to commit the change in `join_docs.py`
* [EMPTY] Re-trigger CI
* Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None`
* Adjusting the arguments of `any()`
* [EMPTY] Re-trigger CI
2022-08-11 10:43:25 +02:00