Malte Pietsch
3134b0d679
fix: type of temperature param and adjust defaults for OpenAIAnswerGenerator ( #3073 )
...
* fix: type of temperature param and adjust defaults
* update schema
* update api docs
2022-09-16 14:11:33 +02:00
Massimiliano Pippi
4ddeb7b14b
chore: fix Windows CI ( #3222 )
...
* replicate issue
* pin openjdk version
* not sure it's needed
2022-09-16 13:08:30 +02:00
nickchomey
42c963f54b
Update rest_api Docker Compose yamls for recent refactoring of rest_api ( #3197 )
...
* update rest_api yamls for recent refactoring
* Update docker-compose.yml
2022-09-15 19:47:40 +02:00
Anam Saatvik Reddy
f50b496f03
bug: fix embedding_dim mismatch in DocumentStore ( #3183 )
...
* match index dim with embed dim (deepset-ai#3090)
* aligned messages across all docstores
* aligned messages across all docstores (deepset-ai#3090)
* aligned messages across all docstores (deepset-ai#3090)
2022-09-15 15:23:53 +02:00
Sara Zan
768583d00c
chore: disable Windows ES tests on CI ( #3220 )
...
* disable Windows ES tests
* Add comments
2022-09-15 15:18:29 +02:00
Daniel Bichuetti
df1f4205b6
feat: add public layout-base extraction support on PDFToTextConverter ( #3137 )
...
* feat(PDFToTextConverter): add option to get text in physical layout order
* test: add physical layout extraction test to PDFToTextConverter
* refactor: change layout parameter attribution places
* docs: manually trigger pre-commits
* docs: generate new docs to comply with pydoc-markdown style
2022-09-13 16:55:21 +02:00
Kristof Herrmann
da1cc577ae
feat: exponential backoff with exp decreasing batch size for opensearch client ( #3194 )
...
* Validate custom_mapping properly as an object
* Remove related test
* black
* feat: exponential backoff with exp dec batch size
* added docstring and split doc lsit
* fix
* fix mypy
* fix
* catch generic exception
* added test
* mypy ignore
* fixed no attribute
* added test
* added tests
* revert strange merge conflicts
* revert merge conflict again
* Update haystack/document_stores/elasticsearch.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* done
* adjust test
* remove not required caplog
* fixed comments
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-09-13 14:30:30 +01:00
Sara Zan
b47c93989b
remove imports redirect ( #3204 )
2022-09-13 11:16:39 +01:00
Sara Zan
49b1c8856e
test: lower low boundary for accuracy in test_calculate_context_similarity_on_non_matching_contexts ( #3199 )
...
* Change min value
* revert test change and pin rapidfuzz<2.8.0
* duplicate
2022-09-13 09:32:38 +02:00
Massimiliano Pippi
64b0c43885
refactoring: reimplement Docker strategy ( #3162 )
...
* setup base images
* add cpu flavor
* use the same Dockerfile for cpu and gpu
* better naming, add docs
* add docker workflow
* add missing image input
* change cwd for bake
* also push api images
* try conditional tagging for releases
* revert testing code
* update docker readme
* document variable override
* use Python 3.10
* allow empty HAYSTACK_EXTRAS
* Apply suggestions from code review
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* remove repo description step, can't make it work so far
* add docs to the last step as it's tricky
* manage tags for the newest images
* tests are passing, checking in the last bit
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-09-12 16:33:56 +02:00
Bijay Gurung
21aedc644f
feat: Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever training with sentence-transformers ( #3164 )
...
* Add option to use MultipleNegativesRankingLoss
Add option to use MultipleNegativesRankingLoss for EmbeddingRetriever
training with sentence-transformers
* Move out losses into separate retriever/_losses.py module
* Remove unused import in retriever/_losses.py
* Apply documentation suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-09-12 09:38:04 +02:00
Sebastian
fc07799206
feat: Updates docs and types for language param in PreProcessor ( #3186 )
...
* Small update to language param docs in PreProcessor
2022-09-12 08:52:52 +02:00
Sara Zan
96bb9b5905
bug: validate custom_mapping as an object ( #3189 )
...
* Validate custom_mapping properly as an object
* Remove related test
* black
2022-09-09 18:03:29 +02:00
Daniel Bichuetti
621e1af74c
refactor: improve support for dataclasses ( #3142 )
...
* refactor: improve support for dataclasses
* refactor: refactor class init
* refactor: remove unused import
* refactor: testing 3.7 diffs
* refactor: checking meta where is Optional
* refactor: reverting some changes on 3.7
* refactor: remove unused imports
* build: manual pre-commit run
* doc: run doc pre-commit manually
* refactor: post initialization hack for 3.7-3.10 compat.
TODO: investigate another method to improve 3.7 compatibility.
* doc: force pre-commit
* refactor: refactored for both Python 3.7 and 3.9
* docs: manually run pre-commit hooks
* docs: run api docs manually
* docs: fix wrong comment
* refactor: change no type-checked test code
* docs: update primitives
* docs: api documentation
* docs: api documentation
* refactor: minor test refactoring
* refactor: remova unused enumeration on test
* refactor: remove unneeded dir in gitignore
* refactor: exclude all private fields and change meta def
* refactor: add pydantic comment
* refactor : fix for mypy on Python 3.7
* refactor: revert custom init
* docs: update docs to new pydoc-markdown style
* Update test/nodes/test_generator.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-09-09 11:31:37 +02:00
Daniel Bichuetti
1a6cbca9b6
feat: add health check endpoint to rest api ( #3168 )
...
* feat: add /health endpoint to rest api
* refactor: adjust to new dir structure
* fix: add new rest api dependency
* docs: add new openapi schema
* docs: manual black run
* refactor: remove some sys-wide details
* docs: minor description changes
* docs: minor description changes
* docs: generate openapi schemas
* tests: improved tests
* refactor: add cls method decorator
2022-09-08 18:24:16 +02:00
Vladimir Blagojevic
e0d73f3ae0
Replace torch.device(cuda) with torch.device(cuda:0) in devices initialization ( #3184 )
2022-09-08 09:36:38 -04:00
Vladimir Blagojevic
20880c9d41
Add 15 min timeout for downloading cached HF models ( #3179 )
2022-09-07 08:35:09 -04:00
Sebastian
62e7c19011
fix: Reduce GPU to CPU copies at inference ( #3127 )
...
* Send matrix from gpu to cpu once instead of individual elements
* Moved location of if statement so it would be triggered only when
needed. Provides very modest speedup for large top_k_per_sample
2022-09-07 11:00:05 +02:00
Steven Haley
9a750f7032
docs: Fix the word length splitting; should be set to 100 not 1,000 ( #3133 )
...
* Fix the word length splitting; should be set to 100 not 1,000 due to limitations of transformer models
* Update documentation for tutorial change
2022-09-07 10:57:54 +02:00
Vladimir Blagojevic
84acb6584f
Type all parameter constructors, add model_version optional parameter where applicable ( #3152 )
2022-09-06 05:05:42 -04:00
Sebastian
20c2320434
Fix for torch device ( #3161 )
2022-09-06 09:03:52 +02:00
Massimiliano Pippi
6790eaf7d8
refactor: update package strategy in rest_api ( #3148 )
...
* update packaging
* fix author metadata
* add newline
* add empty readme
* fix path to pipeline files
* fix pylint job
* fix metadata
2022-09-05 16:58:43 +02:00
Massimiliano Pippi
e2110644c4
docs: add tests types to CONTRIBUTING.md ( #3158 )
...
* Update CONTRIBUTING.md
Add the outcome of #2811 to the developers docs
Ideally, newly added tests will follow those requirements while we progressively adapt the existing tests to the new model.
* address review comments
2022-09-05 16:56:48 +02:00
Daniel Bichuetti
e1f399284f
refactor: update dependencies and remove pins ( #3147 )
...
* refactor: remove azure-core, pydoc and hf-hub pins
* fix: remove extra-comma
* fix: force minimum version of azure forms recognizer
* refactor: allow newer ocr libs
* refactor: update more dependencies and container versions
* refactor: remove extra comment
* docs: pre-commit manual run
* refactor: remove unnecessary dependency
* tests: update weaviate container image version
2022-09-05 14:30:35 +02:00
Massimiliano Pippi
b07fcb7185
feat: add a security policy for Haystack ( #3130 )
...
* add the security policy
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* include review feedback
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-09-02 12:00:14 +02:00
Branden Chan
d4722c2ec5
Document FARMReader.train() evaluation report log level ( #3129 )
...
* Mention evaluation report logging level
* Mention evaluation report logging level
2022-09-01 10:58:47 +02:00
Vladimir Blagojevic
356537c883
Standardize devices parameter and device initialization ( #3062 )
...
* Use devices parameter and initialize devices consistently
2022-08-31 15:30:31 +02:00
Massimiliano Pippi
ffee36c694
pin pydantic to 1.9.2 ( #3126 )
2022-08-31 14:36:40 +02:00
Vladimir Blagojevic
66f3f42a46
fix: Replace multiprocessing tokenization with batched fast tokenization ( #3089 )
...
* Replace multiprocessing tokenization with batched fast tokenization
* Replace deprecated tokenization method invocations
2022-08-31 07:33:39 -04:00
Stefano Fiorucci
e7771dc18e
bug: adapt UI random question for streamlit 1.12 and pin to streamlit>=1.9.0 ( #3121 )
...
* adapt for streamlit 1.12.0 and pin to streamlit>=1.9.0
* make pylint happy
2022-08-31 12:35:40 +02:00
Fernando Pereira
911a2fa7e4
feat: Add warnings to PineconeDocumentStore about indexing metadata if filters return no documents ( #3086 )
...
* black-jupyter format changes
* fix merge
* filters and documents/ids list evaluations fix (for this specific warning context)
2022-08-30 17:02:07 +02:00
Julian Risch
f010a17f04
increase version to next release candidate ( #3115 )
2022-08-29 17:05:44 +02:00
Vladimir Blagojevic
99efab7928
Bump transformers to v4.21.2 ( #3098 )
2022-08-29 11:02:13 -04:00
Sara Zan
e88f1e2577
Add custom_mapping to the list of fields that can contain string-encoded JSON ( #3065 )
2022-08-29 11:10:24 +02:00
Julian Risch
4e518cdddd
chore: increase version for 1.8 release ( #3109 )
...
* increase version for 1.8 release
* ignore missing-timeout for pylint
v1.8.0
2022-08-26 15:00:14 +02:00
Julian Risch
3e3ff33cdd
feat: add batch evaluation method for pipelines ( #2942 )
...
* add basic pipeline.eval_batch for qa without filters
* black formatting
* pydoc-markdown
* remove batch eval tests failing due to bugs
* remove comment
* explain commented out tests
* avoid code duplication
* black
* mypy
* pydoc markdown
* add batch option to execute_eval_run
* pydoc markdown
* Apply documentation suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply documentation suggestion from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* add documentation based on review comments
* black
* black
* schema updates
* remove duplicate tests
* add separate method for column reordering
* merge _build_eval_dataframe methods
* pylint ignore in function
* change type annotation of queries to list only
* one-liner addressing review comment on params dict
* markdown files updated
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-25 17:50:57 +02:00
bogdankostic
e2ec0d1c15
feat: FAISS in OpenSearch: check existing index ( #3101 )
...
* Add check for mapping for existing indices
* Add test
* Check if "method" field exists
2022-08-25 17:33:26 +02:00
Julian Risch
cc9d39c360
increase version to next release candidate ( #3100 )
2022-08-25 15:55:34 +02:00
Julian Risch
0950db5032
chore: increase version to 1.7.2 for patch release ( #3097 )
...
* schema update
* schema update audio nodes
* schema update audio param type
v1.7.2
2022-08-25 13:55:28 +02:00
Sebastian
0cf0568dd0
fix: Use use_auth_token in all cases when loading from the HF Hub ( #3094 )
...
* Making sure to pass on use_auth_token to all from_pretrained calls
2022-08-25 10:30:03 +02:00
Sara Zan
e92ea4fccb
refactor: rename master into main in documentation and links ( #3063 )
...
* master->main
* revert master rename
* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
tstadel
92046ce5b5
feat: FAISS in OpenSearch: Support HNSW for dot product and l2 ( #3029 )
...
* support faiss hnsw
* blacken
* update docs
* improve similarity check
* add tests
* update schema
* set ef_search param correctly
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* regenerate docs
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 16:43:48 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db ( #2749 )
...
* update to PineconeDocumentStore to remove dependency on SQL db
* Update Documentation & Code Style
* typing fixes
* Update Documentation & Code Style
* fixed embedding generator to yield Documents
* Update Documentation & Code Style
* fixes for final typing issues
* fixes for pylint
* Update Documentation & Code Style
* uncomment pinecone tests
* added new params to docstrings
* Update Documentation & Code Style
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* changes based on comments, updated errors and install
* Update Documentation & Code Style
* mypy
* implement simple filtering in pinecone mock
* typo
* typo in reverse
* account for missing meta key in filtering
* typo
* added metadata filtering to describe index
* added handling for users switching indexes in same doc store, and handling duplicate docs in write
* syntax tweaks
* added index option to document/embedding count calls
* labels implementation in progress
* added metadata fields to be indexed for pinecone tests
* further changes to mock
* WIP implementation of labels+multilabels
* switched to rely on labels namespace rather than filter
* simpler delete_labels
* label fixes, remove debug code
* Apply dostring fixes
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* pylint
* docs
* temporarily un-mock Pinecone
* Small Pinecone test suite
* pylint
* Add fake test key to pass the None check
* Add again fake test key to pass the None check
* Add Pinecone to default docstores and fix filters
* Fix field name
* Change field name
* Change field value
* Remove comments
* forgot to upgrade pyproject.toml
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
Stefano Fiorucci
891707ecaa
bug: handle Optional params in schema validation ( #2980 )
...
* not working draft
* first draft
* fix
* revert json schema
* better schema
* improvements, support different python versions
* little simplification
* improvements and more tests
* Revert "Merge branch 'handle_optional_params' into origin/main"
This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34.
* fix git mess
* handle optional params; schema
* test null values
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-24 10:40:19 +02:00
Ofek Lev
f6a4a14790
refactor: update package metadata ( #3079 )
...
* Update package metadata
* fix yaml
* remove Python version cap
* address review
2022-08-24 09:46:21 +02:00
Branden Chan
6d4031d8f6
Add OpenAI Answer Generator API ( #3050 )
...
* Add OpenAI Answer Generator API
* Regen tutorials
* Regen md files
* Incorporate reviewer feedback
* Incorporate reviewer feedback
* Incorporate reviewer feedback
* Incorporate reviewer feedback
2022-08-24 09:20:08 +02:00
Malte Pietsch
76af0444cc
feat: add progressbar to upload_files() of deepset Cloud client ( #3069 )
2022-08-23 20:51:08 +02:00
Sebastian
3ea57801ae
feat: Early stopping can be used in Reader and Retriever training ( #3071 )
...
* Add option to set early stopping in training
* Moved EarlyStopping to haystack/utils/early_stopping.py and added EarlyStopping to training Dense retrievers.
2022-08-23 14:18:12 +02:00
bogdankostic
b03de53716
Use random_sample instead of ndarray for random array ( #3083 )
2022-08-22 13:19:45 +02:00
Daniel Bichuetti
149224fe3a
fix: Crawler quits ChromeDriver on destruction ( #3070 )
...
* Close Chrome and Selenium WebDriver on destruction
* Fix failed pre-commit hook
2022-08-22 13:08:16 +02:00