Massimiliano Pippi
b07fcb7185
feat: add a security policy for Haystack ( #3130 )
...
* add the security policy
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* include review feedback
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-09-02 12:00:14 +02:00
Branden Chan
d4722c2ec5
Document FARMReader.train() evaluation report log level ( #3129 )
...
* Mention evaluation report logging level
* Mention evaluation report logging level
2022-09-01 10:58:47 +02:00
Vladimir Blagojevic
356537c883
Standardize devices parameter and device initialization ( #3062 )
...
* Use devices parameter and initialize devices consistently
2022-08-31 15:30:31 +02:00
Massimiliano Pippi
ffee36c694
pin pydantic to 1.9.2 ( #3126 )
2022-08-31 14:36:40 +02:00
Vladimir Blagojevic
66f3f42a46
fix: Replace multiprocessing tokenization with batched fast tokenization ( #3089 )
...
* Replace multiprocessing tokenization with batched fast tokenization
* Replace deprecated tokenization method invocations
2022-08-31 07:33:39 -04:00
Stefano Fiorucci
e7771dc18e
bug: adapt UI random question for streamlit 1.12 and pin to streamlit>=1.9.0 ( #3121 )
...
* adapt for streamlit 1.12.0 and pin to streamlit>=1.9.0
* make pylint happy
2022-08-31 12:35:40 +02:00
Fernando Pereira
911a2fa7e4
feat: Add warnings to PineconeDocumentStore about indexing metadata if filters return no documents ( #3086 )
...
* black-jupyter format changes
* fix merge
* filters and documents/ids list evaluations fix (for this specific warning context)
2022-08-30 17:02:07 +02:00
Julian Risch
f010a17f04
increase version to next release candidate ( #3115 )
2022-08-29 17:05:44 +02:00
Vladimir Blagojevic
99efab7928
Bump transformers to v4.21.2 ( #3098 )
2022-08-29 11:02:13 -04:00
Sara Zan
e88f1e2577
Add custom_mapping to the list of fields that can contain string-encoded JSON ( #3065 )
2022-08-29 11:10:24 +02:00
Julian Risch
4e518cdddd
chore: increase version for 1.8 release ( #3109 )
...
* increase version for 1.8 release
* ignore missing-timeout for pylint
v1.8.0
2022-08-26 15:00:14 +02:00
Julian Risch
3e3ff33cdd
feat: add batch evaluation method for pipelines ( #2942 )
...
* add basic pipeline.eval_batch for qa without filters
* black formatting
* pydoc-markdown
* remove batch eval tests failing due to bugs
* remove comment
* explain commented out tests
* avoid code duplication
* black
* mypy
* pydoc markdown
* add batch option to execute_eval_run
* pydoc markdown
* Apply documentation suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply documentation suggestion from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* add documentation based on review comments
* black
* black
* schema updates
* remove duplicate tests
* add separate method for column reordering
* merge _build_eval_dataframe methods
* pylint ignore in function
* change type annotation of queries to list only
* one-liner addressing review comment on params dict
* markdown files updated
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-25 17:50:57 +02:00
bogdankostic
e2ec0d1c15
feat: FAISS in OpenSearch: check existing index ( #3101 )
...
* Add check for mapping for existing indices
* Add test
* Check if "method" field exists
2022-08-25 17:33:26 +02:00
Julian Risch
cc9d39c360
increase version to next release candidate ( #3100 )
2022-08-25 15:55:34 +02:00
Julian Risch
0950db5032
chore: increase version to 1.7.2 for patch release ( #3097 )
...
* schema update
* schema update audio nodes
* schema update audio param type
v1.7.2
2022-08-25 13:55:28 +02:00
Sebastian
0cf0568dd0
fix: Use use_auth_token in all cases when loading from the HF Hub ( #3094 )
...
* Making sure to pass on use_auth_token to all from_pretrained calls
2022-08-25 10:30:03 +02:00
Sara Zan
e92ea4fccb
refactor: rename master
into main
in documentation and links ( #3063 )
...
* master->main
* revert master rename
* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
tstadel
92046ce5b5
feat: FAISS in OpenSearch: Support HNSW for dot product and l2 ( #3029 )
...
* support faiss hnsw
* blacken
* update docs
* improve similarity check
* add tests
* update schema
* set ef_search param correctly
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* regenerate docs
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 16:43:48 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db ( #2749 )
...
* update to PineconeDocumentStore to remove dependency on SQL db
* Update Documentation & Code Style
* typing fixes
* Update Documentation & Code Style
* fixed embedding generator to yield Documents
* Update Documentation & Code Style
* fixes for final typing issues
* fixes for pylint
* Update Documentation & Code Style
* uncomment pinecone tests
* added new params to docstrings
* Update Documentation & Code Style
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* changes based on comments, updated errors and install
* Update Documentation & Code Style
* mypy
* implement simple filtering in pinecone mock
* typo
* typo in reverse
* account for missing meta key in filtering
* typo
* added metadata filtering to describe index
* added handling for users switching indexes in same doc store, and handling duplicate docs in write
* syntax tweaks
* added index option to document/embedding count calls
* labels implementation in progress
* added metadata fields to be indexed for pinecone tests
* further changes to mock
* WIP implementation of labels+multilabels
* switched to rely on labels namespace rather than filter
* simpler delete_labels
* label fixes, remove debug code
* Apply dostring fixes
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* pylint
* docs
* temporarily un-mock Pinecone
* Small Pinecone test suite
* pylint
* Add fake test key to pass the None check
* Add again fake test key to pass the None check
* Add Pinecone to default docstores and fix filters
* Fix field name
* Change field name
* Change field value
* Remove comments
* forgot to upgrade pyproject.toml
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
Stefano Fiorucci
891707ecaa
bug: handle Optional
params in schema validation ( #2980 )
...
* not working draft
* first draft
* fix
* revert json schema
* better schema
* improvements, support different python versions
* little simplification
* improvements and more tests
* Revert "Merge branch 'handle_optional_params' into origin/main"
This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34.
* fix git mess
* handle optional params; schema
* test null values
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-24 10:40:19 +02:00
Ofek Lev
f6a4a14790
refactor: update package metadata ( #3079 )
...
* Update package metadata
* fix yaml
* remove Python version cap
* address review
2022-08-24 09:46:21 +02:00
Branden Chan
6d4031d8f6
Add OpenAI Answer Generator API ( #3050 )
...
* Add OpenAI Answer Generator API
* Regen tutorials
* Regen md files
* Incorporate reviewer feedback
* Incorporate reviewer feedback
* Incorporate reviewer feedback
* Incorporate reviewer feedback
2022-08-24 09:20:08 +02:00
Malte Pietsch
76af0444cc
feat: add progressbar to upload_files() of deepset Cloud client ( #3069 )
2022-08-23 20:51:08 +02:00
Sebastian
3ea57801ae
feat: Early stopping can be used in Reader and Retriever training ( #3071 )
...
* Add option to set early stopping in training
* Moved EarlyStopping to haystack/utils/early_stopping.py and added EarlyStopping to training Dense retrievers.
2022-08-23 14:18:12 +02:00
bogdankostic
b03de53716
Use random_sample
instead of ndarray
for random array ( #3083 )
2022-08-22 13:19:45 +02:00
Daniel Bichuetti
149224fe3a
fix: Crawler quits ChromeDriver on destruction ( #3070 )
...
* Close Chrome and Selenium WebDriver on destruction
* Fix failed pre-commit hook
2022-08-22 13:08:16 +02:00
Daniel Bichuetti
d715d0202d
fix: update ChromeDriver options on restricted environments and add ChromeDriver options as function parameter ( #3043 )
...
* Fix when env does nto exist
* Fix missed line
* Set conservative chromedriver options
* Set default options based on environment
* Fix removed line
* Updated documentation
* Generate new schemas manually
* Add arguments via iterator and helper function
* Pre-push doc format
* Use imported Option vs full namespace access
* Manually update schema
* Manually add documentation and schema
* Fix language and documentation
* Fix typo
* Auto generated docs
* Updated documentation
2022-08-22 12:59:33 +02:00
David G
e715dee17d
docs:fixed typo (or old documentation) in ipynb tutorial 3 ( #3033 )
...
* Update Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb
Just fixed the key in the document dictionary format so `write_documents()` won't raise an error. By the way the `write_documents()` error is really explicative
* Run convert_notebooks_into_webpages.py
Co-authored-by: David Gervasoni <david.gervasoni@trix.ai>
2022-08-22 12:56:30 +02:00
Massimiliano Pippi
97a8d30512
feat: Allow exact list matching with field in Elasticsearch filtering ( #2988 )
...
* ES filtering - allow exact list matching with field
typing fix
Update Documentation & Code Style
remove default hit limit in filtering queries
Update Documentation & Code Style
pytest es list eq filter
Update Documentation & Code Style
* review feedback
* fixed test
Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2022-08-22 12:42:37 +02:00
Daniel Bichuetti
d5e36ce6b4
fix(translator): write translated text to output documents, while keeping input untouched ( #3077 )
...
* Set translated text on a copy of original document
* Return new translated list
* Manually generated docs
TODO: check pre-commit
* Hook generated file
* Rename variables for better maintenance
* fix(translator): prevent inputs from being changed
* fix: manual update translator docs
* style(translator): explicit type declaration on List
* docs(translator): re-run pre-commit hook
* style(translator): ignore mypy wrong type check
* docs(translator): re-run pre-commit hook
2022-08-22 04:07:05 -04:00
Julian Risch
bc6f71b5ba
chore: increase version to next release candidate ( #3067 )
...
* increase version to next release candidate
* generate schema files
2022-08-19 14:49:50 +02:00
Julian Risch
eb0f0da0fd
Prepare 1.7.1 release ( #3061 )
...
* prepare 1.7.1 release
* Fix schemas
* Update haystack/json-schemas/haystack-pipeline-1.7.1.schema.json
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* change back main to master
* remove newline at end of file
* generate schema file with no newline
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
v1.7.1
2022-08-19 13:24:40 +02:00
Vladimir Blagojevic
be127e5b61
Trigger build failure Slack notify only on main repo (not forks) ( #3039 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-18 06:51:39 -04:00
Massimiliano Pippi
af24ffae55
feat: take the list of models to cache instead of hardcoding one ( #3060 )
...
* take the list of models to cache as an input
* let nltk find the cache dir on its own
2022-08-18 11:55:29 +02:00
tstadel
1027ab3624
Bump Version to 1.7.1rc ( #3041 )
...
* bump version to 1.7.1rc
* update openapi
2022-08-18 10:31:57 +02:00
James Briggs
82c9cff3d9
test: update filtering of Pinecone mock to imitate doc store ( #3020 )
...
* updated filtering of doc store to imitate pinecone
* Update test/mocks/pinecone.py
2022-08-18 09:57:08 +02:00
Sebastian
74b7c2c12a
Pin pyworld to <=0.2.12 ( #3047 )
2022-08-17 08:11:28 +02:00
Massimiliano Pippi
2328097ce0
rename the default branch name ( #3045 )
2022-08-16 20:24:58 +02:00
Tuana Celik
2298155a20
changing Slack to Discord ( #3040 )
...
* changing Slack to Discord
* Update README.md
* updating contributing
2022-08-15 15:56:16 +03:00
tstadel
baefd32b6f
Upgrade to v1.7.0 and copy docs folder ( #3014 )
...
* update version to 1.7.0
* copy docs
* update openapi
* generate schemas
* make update_json_schema() idempotent
* update docs, schema and openapi
v1.7.0
2022-08-15 14:20:30 +02:00
Julian Risch
d61755322f
chore: fix typo in API docs ( #3023 )
...
* chore: fix typo in API docs
* fix openapi
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-15 13:25:20 +02:00
tstadel
0aa0c68785
Fix broken MultiLabel
serialization ( #3037 )
...
* Fix MultiLabel serialization
* update docs
* better comment
* remove unused imports
* remove unused imports (2)
2022-08-15 13:09:18 +02:00
Branden Chan
ff38a20863
docs: update File Classifier Docstring ( #3018 )
...
* Update docstring
* Trigger pre-commit hook
* Trigger pre-commit hook
* Incorporate reviewer feedback
* Incorporate reviewer feedback
2022-08-15 12:37:28 +02:00
Branden Chan
7312f99584
Update Summarizer Docs ( #3032 )
...
* Change text to content
* Change text to content
2022-08-15 12:35:41 +02:00
bogdankostic
3a849d6c07
bug: Make TranslationWrapperPipeline
work with QuestionAnswerGenerationPipeline
( #3034 )
...
* Overwrite output_translator's run method with run_batch
* Fix mypy
* Revert change
* Overwrite run method only with QuestionAnswerGenerationPipeline
2022-08-15 10:05:34 +02:00
Malte Pietsch
1b422ab657
feat: Enable isolated node eval for answer generator nodes (incl. OpenAI Node) ( #3036 )
...
* enable isolated node eval for answer generator nodes
* adjust comment
* remove unused import
* fix mypy
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-08-14 12:11:23 +02:00
Stefano Fiorucci
4f261a4575
docs: extend tutorial14 about query classification ( #3013 )
...
* first draft for tutorial extension
* forgotten markdown
* improved tutorial
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* add markdown
* first draft for tutorial extension
* forgotten markdown
* improved tutorial
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* add markdown
* little corrections
* little corrections and add py tutorial
* Update tutorials/Tutorial14_Query_Classifier.ipynb
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update tutorials/Tutorial14_Query_Classifier.ipynb
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update tutorials/Tutorial14_Query_Classifier.ipynb
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update tutorials/Tutorial14_Query_Classifier.ipynb
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* update tutorial webpage
* fix typo
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-12 17:59:47 +02:00
Igor Tarlinskiy
5b06658670
Forbid the key id
from Document
s to be written in WeaviateDocumentStore
( #2846 )
...
* Raise error upon duplicate document key found within meta info
* value error msg fix
* Update Documentation & Code Style
* Raise exception instead of asserting
* Update Documentation & Code Style
* add test
2022-08-12 17:50:54 +02:00
Dmitry Goryunov
da7836a931
feat: Support embedding dimensions on DeepsetCloudDocumentStore ( #2995 )
...
* Add embedding_dim to dc store
* Remove similarity from query params, it is not used
* Remove unused `return_embedding` parameter
* Remove unused param
* Update the documentation
* Update schemas
* Revert openapi changes
* Revert openapi changes
* Fix openapi
* Fix json schema
* Improve docstrings
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Improve logs
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update the docs
* Fix similarity
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:46:52 +02:00
tstadel
c0fbe45c02
feat: Add delete_all_files()
to FileClient
( #3025 )
...
* add delete_all_files()
* rename `file` to `files`
* Update haystack/utils/deepsetcloud.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/utils/deepsetcloud.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update haystack/utils/deepsetcloud.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* streamline "If set to None" and "to the API call"
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-12 11:20:30 +02:00