1675 Commits

Author SHA1 Message Date
Massimiliano Pippi
97a8d30512
feat: Allow exact list matching with field in Elasticsearch filtering (#2988)
* ES filtering - allow exact list matching with field

typing fix

Update Documentation & Code Style

remove default hit limit in filtering queries

Update Documentation & Code Style

pytest es list eq filter

Update Documentation & Code Style

* review feedback

* fixed test

Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2022-08-22 12:42:37 +02:00
Daniel Bichuetti
d5e36ce6b4
fix(translator): write translated text to output documents, while keeping input untouched (#3077)
* Set translated text on a copy of original document

* Return new translated list

* Manually generated docs

TODO: check pre-commit

* Hook generated file

* Rename variables for better maintenance

* fix(translator): prevent inputs from being changed

* fix: manual update translator docs

* style(translator): explicit type declaration on List

* docs(translator): re-run pre-commit hook

* style(translator): ignore mypy wrong type check

* docs(translator): re-run pre-commit hook
2022-08-22 04:07:05 -04:00
Julian Risch
bc6f71b5ba
chore: increase version to next release candidate (#3067)
* increase version to next release candidate

* generate schema files
2022-08-19 14:49:50 +02:00
Julian Risch
eb0f0da0fd
Prepare 1.7.1 release (#3061)
* prepare 1.7.1 release

* Fix schemas

* Update haystack/json-schemas/haystack-pipeline-1.7.1.schema.json

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* change back main to master

* remove newline at end of file

* generate schema file with no newline

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
v1.7.1
2022-08-19 13:24:40 +02:00
Vladimir Blagojevic
be127e5b61
Trigger build failure Slack notify only on main repo (not forks) (#3039)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-18 06:51:39 -04:00
Massimiliano Pippi
af24ffae55
feat: take the list of models to cache instead of hardcoding one (#3060)
* take the list of models to cache as an input

* let nltk find the cache dir on its own
2022-08-18 11:55:29 +02:00
tstadel
1027ab3624
Bump Version to 1.7.1rc (#3041)
* bump version to 1.7.1rc

* update openapi
2022-08-18 10:31:57 +02:00
James Briggs
82c9cff3d9
test: update filtering of Pinecone mock to imitate doc store (#3020)
* updated filtering of doc store to imitate pinecone

* Update test/mocks/pinecone.py
2022-08-18 09:57:08 +02:00
Sebastian
74b7c2c12a
Pin pyworld to <=0.2.12 (#3047) 2022-08-17 08:11:28 +02:00
Massimiliano Pippi
2328097ce0
rename the default branch name (#3045) 2022-08-16 20:24:58 +02:00
Tuana Celik
2298155a20
changing Slack to Discord (#3040)
* changing Slack to Discord

* Update README.md

* updating contributing
2022-08-15 15:56:16 +03:00
tstadel
baefd32b6f
Upgrade to v1.7.0 and copy docs folder (#3014)
* update version to 1.7.0

* copy docs

* update openapi

* generate schemas

* make update_json_schema() idempotent

* update docs, schema and openapi
v1.7.0
2022-08-15 14:20:30 +02:00
Julian Risch
d61755322f
chore: fix typo in API docs (#3023)
* chore: fix typo in API docs

* fix openapi

Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-15 13:25:20 +02:00
tstadel
0aa0c68785
Fix broken MultiLabel serialization (#3037)
* Fix MultiLabel serialization

* update docs

* better comment

* remove unused imports

* remove unused imports (2)
2022-08-15 13:09:18 +02:00
Branden Chan
ff38a20863
docs: update File Classifier Docstring (#3018)
* Update docstring

* Trigger pre-commit hook

* Trigger pre-commit hook

* Incorporate reviewer feedback

* Incorporate reviewer feedback
2022-08-15 12:37:28 +02:00
Branden Chan
7312f99584
Update Summarizer Docs (#3032)
* Change text to content

* Change text to content
2022-08-15 12:35:41 +02:00
bogdankostic
3a849d6c07
bug: Make TranslationWrapperPipeline work with QuestionAnswerGenerationPipeline (#3034)
* Overwrite output_translator's run method with run_batch

* Fix mypy

* Revert change

* Overwrite run method only with QuestionAnswerGenerationPipeline
2022-08-15 10:05:34 +02:00
Malte Pietsch
1b422ab657
feat: Enable isolated node eval for answer generator nodes (incl. OpenAI Node) (#3036)
* enable isolated node eval for answer generator nodes

* adjust comment

* remove unused import

* fix mypy

Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-08-14 12:11:23 +02:00
Stefano Fiorucci
4f261a4575
docs: extend tutorial14 about query classification (#3013)
* first draft for tutorial extension

* forgotten markdown

* improved tutorial

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* add markdown

* first draft for tutorial extension

* forgotten markdown

* improved tutorial

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* add markdown

* little corrections

* little corrections and add py tutorial

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* update tutorial webpage

* fix typo

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-12 17:59:47 +02:00
Igor Tarlinskiy
5b06658670
Forbid the key id from Documents to be written in WeaviateDocumentStore (#2846)
* Raise error upon duplicate document key found within meta info

* value error msg fix

* Update Documentation & Code Style

* Raise exception instead of asserting

* Update Documentation & Code Style

* add test
2022-08-12 17:50:54 +02:00
Dmitry Goryunov
da7836a931
feat: Support embedding dimensions on DeepsetCloudDocumentStore (#2995)
* Add embedding_dim to dc store

* Remove similarity from query params, it is not used

* Remove unused `return_embedding` parameter

* Remove unused param

* Update the documentation

* Update schemas

* Revert openapi changes

* Revert openapi changes

* Fix openapi

* Fix json schema

* Improve docstrings

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Improve logs

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update the docs

* Fix similarity

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:46:52 +02:00
tstadel
c0fbe45c02
feat: Add delete_all_files() to FileClient (#3025)
* add delete_all_files()

* rename `file` to `files`

* Update haystack/utils/deepsetcloud.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/utils/deepsetcloud.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/utils/deepsetcloud.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* streamline "If set to None" and "to the API call"

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:20:30 +02:00
tstadel
668fd548a6
Fix embeddings_field_supports_similarity of OpenSearchDocumentStore when creating index (#3030)
* fix embeddings_field_supports_similarity when creating index

* fix test
2022-08-12 11:19:59 +02:00
James Briggs
26c938a8e6
test: add meta fields for meta_config to be used during testing (#3021)
* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore

* Add documentation on metadata filtering in  docstring

* docs

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-12 10:27:56 +02:00
bogdankostic
81a5949103
ci: Increase Weaviate's disk usage + print docker logs (#3026) 2022-08-11 18:13:43 +02:00
Sebastian
44e2b1beed
Resolving issue 2853: no answer logic in FARMReader (#2856)
* Update FARMReader.eval_on_file to be consistent with FARMReader.eval

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-08-11 16:45:03 +02:00
Sara Zan
fc8ecbf20c
Move azure-core pin into the dev dependency list (#3022) 2022-08-11 15:16:43 +02:00
Zoltan Fedor
408d8e6ff5
Enable the JoinDocuments node to work with documents with score=None (#2984)
* Enable the `JoinDocuments` node to work with documents with `score=None`

This fixes #2983

As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate.
THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`.

There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: https://github.com/deepset-ai/haystack/pull/2436
This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried.

The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default.

* Fixing test bug

* Addressing PR review comments

- Extending unit tests
- Simplifying logic

* Making the sorting work even with no scores

By making the no score being sorted as -Inf

* Forgot to commit the change in `join_docs.py`

* [EMPTY] Re-trigger CI

* Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None`

* Adjusting the arguments of `any()`

* [EMPTY] Re-trigger CI
2022-08-11 10:43:25 +02:00
Massimiliano Pippi
2cd65e99b8
revert Remove pipes (#3006) 2022-08-11 10:42:22 +02:00
Zoltan Fedor
aafa017c17
Refactoring the Raypipeline.run method - merging it with the Pipeline.run (#2981)
* Refactoring the `Raypipeline.run` method - merging it with the `Pipeline.run`

This is to fix #2968

* Bug: variable `i` was already in use

* Removing unused imports

* Removing unused import

* [EMPTY] Re-trigger CI

* Addressing concerns raised pre-review

- Removing the attempt to try to make it without the need for `JoinDocuments` - it is okey to fail without `JoinDocuments` for certain pipelines.

* Refactoring based on reviews
2022-08-11 09:50:14 +02:00
Zoltan Fedor
f4128d3581
Adding support for additional distance/similarity metrics for Weaviate (#3001)
* Adding support for additional distance metrics for Weaviate

Fixes #3000

* Updating the docs

* Fixing error texts

* Fixing issues raised by the review

* Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity`

* [EMPTY] Re-trigger CI

* Fixing things based on review

* [EMPTY] Re-trigger CI
2022-08-11 09:48:21 +02:00
Florian Hardow
0b39ce6431
fetch experiment run results from dc (#2960)
* feat: fetch results for DeepsetCloudExperiments

* chore: test DC fetch predicitons for eval run

* chore: switch to dict iteration with .items()

* chore: update DC url to fetch predictions from

* chore: update doc strings for fetching eval run results

* chore: update DeepsetCloudExperiments description, change function names for fetching predictions of an eval run

* chore: test for DeepsetCloudExperiments.get_run_results

* chore: adjust request mock for test_get_eval_run_results

* chore: push first row of dataframe into variable for test checks

* chore: adjust mock data to correct data types

* chore: make documentation more readable with line breaks

* chore: update documentation for eval run result fetching
2022-08-10 15:02:36 +02:00
Stefano Fiorucci
5778b6f9e9
fix run_batch unbound error (#3016) 2022-08-10 12:59:15 +02:00
James Briggs
5d4e3bd7ca
convert to set so not relying on correct order (#3015) 2022-08-10 12:57:31 +02:00
James Briggs
524c9b959d
switch label variables in test_labels (#3011) 2022-08-10 12:01:57 +02:00
camille
f363b152ff
bug: make MultiLabel ids consistent across python interpreters (#2998)
* use hashlib.md5() instead of (interpreter dependent) hash() funtion to generate MultiLabel id

* add tests to assess constancy of MultiLabel.id

* make test_multilabel_id test ensure that MultiLabel ids are always the same
2022-08-10 09:43:21 +02:00
Julian Risch
b685409c78
chore: add topic tags to auto generation of release notes (#3008) 2022-08-09 17:12:42 +02:00
bogdankostic
5c3bfad078
feat: Add page number to Documents coming from PDFConverters and PreProcessor (#2932)
* Add page number to Documents coming from PDFConverters and PreProcessor

* Fix mypy

* Update API Docs

* Update API Docs

* Remove unused imports

* Generate JSON schema

* Generate JSON schema

* Make test variable shorter

* Make regex a separate function

* Move counting of page breaks to a function

* Generate JSON schema

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update API Documentation

* Don't create instance for testing staticmethod

* Update haystack/nodes/preprocessor/preprocessor.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-09 15:55:27 +02:00
Stefano Fiorucci
09707b576a
Make MultiLabel preserve order (#2956)
* try simple approach

* added test

* add requested test
2022-08-09 15:53:24 +02:00
Branden Chan
dfeb171686
Add API page for util functions (#2863)
* Clean OpenAIAnswerGenerator docstrings

* Incorporate reviewer feedback

* Update Documentation & Code Style

* Improve id_hash_keys description

* Simplify id_hash_keys description

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-08-09 14:53:45 +02:00
Vladimir Blagojevic
50f7d660e2
Add slack hook for test failures (#2996) 2022-08-09 08:27:52 -04:00
Massimiliano Pippi
862ac31b5c
bump streamlit version (#3002) 2022-08-09 10:52:41 +02:00
Stefano Fiorucci
4a63484916
feat: Extend TransformersQueryClassifier: clean version (#2965)
* extend query classifier in one commit

* variable number of outgoing edges

* improve tests

* fix unused import

* lightweight approach

* fix _calculate_outgoing_edges

* remove duplicate label validation

* Remove print
2022-08-09 09:43:33 +02:00
MichelBartels
c91316e862
feat: add gradient accumulation in FARMReader (#2925)
* expose gradient accumulation to train function of FARMReader

* add documentation for gradient accumulation

* Update Documentation & Code Style

* doc string improvements

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* doc string improvements

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* doc string improvements

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-08 18:42:21 +02:00
Sara Zan
82448efa4f
feat: warn users if they're calling get_all_labels on a document index and vice-versa (Elasticsearch & Opensearch only) (#2990)
* Add fix to ES

* Update haystack/document_stores/elasticsearch.py
2022-08-08 16:50:42 +02:00
Vladimir Blagojevic
d1f8b7118c
Add progress bar to batch run component ops (#2864)
* Add progress bar to batch run component ops

* Update docs

* Update schema

* PR review: thanks Bogdan
2022-08-08 09:32:44 -04:00
Massimiliano Pippi
0e8efdafa9
Add enhanced pydoc-markdown pre-hook (#2979)
* add pydoc-markdown pre-hook

* add more comments, remove debug prints
2022-08-08 12:41:21 +02:00
Sara Zan
1a0a4c8836
Remove pipes from code block (#2973)
* Remove pipes

* Generate md
2022-08-05 19:18:57 +02:00
James Briggs
4ba2444652
Update CONTRIBUTING.md (#2975) 2022-08-05 19:00:18 +02:00
Tobias Wochinger
065173fe5e
chore: add PR template (#2883)
* chore: add PR template

* ci: update PR template after latest discussions in Notion

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* Apply suggestions from code review

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update .github/pull_request_template.md

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* docs: re-order and add link

* docs: add new conventions to contributor guidelines

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-05 18:14:18 +02:00