Steven Haley
9a750f7032
docs: Fix the word length splitting; should be set to 100 not 1,000 ( #3133 )
...
* Fix the word length splitting; should be set to 100 not 1,000 due to limitations of transformer models
* Update documentation for tutorial change
2022-09-07 10:57:54 +02:00
Vladimir Blagojevic
84acb6584f
Type all parameter constructors, add model_version optional parameter where applicable ( #3152 )
2022-09-06 05:05:42 -04:00
Sebastian
20c2320434
Fix for torch device ( #3161 )
2022-09-06 09:03:52 +02:00
Massimiliano Pippi
6790eaf7d8
refactor: update package strategy in rest_api ( #3148 )
...
* update packaging
* fix author metadata
* add newline
* add empty readme
* fix path to pipeline files
* fix pylint job
* fix metadata
2022-09-05 16:58:43 +02:00
Massimiliano Pippi
e2110644c4
docs: add tests types to CONTRIBUTING.md ( #3158 )
...
* Update CONTRIBUTING.md
Add the outcome of #2811 to the developers docs
Ideally, newly added tests will follow those requirements while we progressively adapt the existing tests to the new model.
* address review comments
2022-09-05 16:56:48 +02:00
Daniel Bichuetti
e1f399284f
refactor: update dependencies and remove pins ( #3147 )
...
* refactor: remove azure-core, pydoc and hf-hub pins
* fix: remove extra-comma
* fix: force minimum version of azure forms recognizer
* refactor: allow newer ocr libs
* refactor: update more dependencies and container versions
* refactor: remove extra comment
* docs: pre-commit manual run
* refactor: remove unnecessary dependency
* tests: update weaviate container image version
2022-09-05 14:30:35 +02:00
Massimiliano Pippi
b07fcb7185
feat: add a security policy for Haystack ( #3130 )
...
* add the security policy
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* include review feedback
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-09-02 12:00:14 +02:00
Branden Chan
d4722c2ec5
Document FARMReader.train() evaluation report log level ( #3129 )
...
* Mention evaluation report logging level
* Mention evaluation report logging level
2022-09-01 10:58:47 +02:00
Vladimir Blagojevic
356537c883
Standardize devices parameter and device initialization ( #3062 )
...
* Use devices parameter and initialize devices consistently
2022-08-31 15:30:31 +02:00
Massimiliano Pippi
ffee36c694
pin pydantic to 1.9.2 ( #3126 )
2022-08-31 14:36:40 +02:00
Vladimir Blagojevic
66f3f42a46
fix: Replace multiprocessing tokenization with batched fast tokenization ( #3089 )
...
* Replace multiprocessing tokenization with batched fast tokenization
* Replace deprecated tokenization method invocations
2022-08-31 07:33:39 -04:00
Stefano Fiorucci
e7771dc18e
bug: adapt UI random question for streamlit 1.12 and pin to streamlit>=1.9.0 ( #3121 )
...
* adapt for streamlit 1.12.0 and pin to streamlit>=1.9.0
* make pylint happy
2022-08-31 12:35:40 +02:00
Fernando Pereira
911a2fa7e4
feat: Add warnings to PineconeDocumentStore about indexing metadata if filters return no documents ( #3086 )
...
* black-jupyter format changes
* fix merge
* filters and documents/ids list evaluations fix (for this specific warning context)
2022-08-30 17:02:07 +02:00
Julian Risch
f010a17f04
increase version to next release candidate ( #3115 )
2022-08-29 17:05:44 +02:00
Vladimir Blagojevic
99efab7928
Bump transformers to v4.21.2 ( #3098 )
2022-08-29 11:02:13 -04:00
Sara Zan
e88f1e2577
Add custom_mapping to the list of fields that can contain string-encoded JSON ( #3065 )
2022-08-29 11:10:24 +02:00
Julian Risch
4e518cdddd
chore: increase version for 1.8 release ( #3109 )
...
* increase version for 1.8 release
* ignore missing-timeout for pylint
v1.8.0
2022-08-26 15:00:14 +02:00
Julian Risch
3e3ff33cdd
feat: add batch evaluation method for pipelines ( #2942 )
...
* add basic pipeline.eval_batch for qa without filters
* black formatting
* pydoc-markdown
* remove batch eval tests failing due to bugs
* remove comment
* explain commented out tests
* avoid code duplication
* black
* mypy
* pydoc markdown
* add batch option to execute_eval_run
* pydoc markdown
* Apply documentation suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply documentation suggestion from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* add documentation based on review comments
* black
* black
* schema updates
* remove duplicate tests
* add separate method for column reordering
* merge _build_eval_dataframe methods
* pylint ignore in function
* change type annotation of queries to list only
* one-liner addressing review comment on params dict
* markdown files updated
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-25 17:50:57 +02:00
bogdankostic
e2ec0d1c15
feat: FAISS in OpenSearch: check existing index ( #3101 )
...
* Add check for mapping for existing indices
* Add test
* Check if "method" field exists
2022-08-25 17:33:26 +02:00
Julian Risch
cc9d39c360
increase version to next release candidate ( #3100 )
2022-08-25 15:55:34 +02:00
Julian Risch
0950db5032
chore: increase version to 1.7.2 for patch release ( #3097 )
...
* schema update
* schema update audio nodes
* schema update audio param type
v1.7.2
2022-08-25 13:55:28 +02:00
Sebastian
0cf0568dd0
fix: Use use_auth_token in all cases when loading from the HF Hub ( #3094 )
...
* Making sure to pass on use_auth_token to all from_pretrained calls
2022-08-25 10:30:03 +02:00
Sara Zan
e92ea4fccb
refactor: rename master into main in documentation and links ( #3063 )
...
* master->main
* revert master rename
* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
tstadel
92046ce5b5
feat: FAISS in OpenSearch: Support HNSW for dot product and l2 ( #3029 )
...
* support faiss hnsw
* blacken
* update docs
* improve similarity check
* add tests
* update schema
* set ef_search param correctly
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* regenerate docs
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 16:43:48 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db ( #2749 )
...
* update to PineconeDocumentStore to remove dependency on SQL db
* Update Documentation & Code Style
* typing fixes
* Update Documentation & Code Style
* fixed embedding generator to yield Documents
* Update Documentation & Code Style
* fixes for final typing issues
* fixes for pylint
* Update Documentation & Code Style
* uncomment pinecone tests
* added new params to docstrings
* Update Documentation & Code Style
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update Documentation & Code Style
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Update haystack/document_stores/pinecone.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* changes based on comments, updated errors and install
* Update Documentation & Code Style
* mypy
* implement simple filtering in pinecone mock
* typo
* typo in reverse
* account for missing meta key in filtering
* typo
* added metadata filtering to describe index
* added handling for users switching indexes in same doc store, and handling duplicate docs in write
* syntax tweaks
* added index option to document/embedding count calls
* labels implementation in progress
* added metadata fields to be indexed for pinecone tests
* further changes to mock
* WIP implementation of labels+multilabels
* switched to rely on labels namespace rather than filter
* simpler delete_labels
* label fixes, remove debug code
* Apply dostring fixes
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* mypy
* pylint
* docs
* temporarily un-mock Pinecone
* Small Pinecone test suite
* pylint
* Add fake test key to pass the None check
* Add again fake test key to pass the None check
* Add Pinecone to default docstores and fix filters
* Fix field name
* Change field name
* Change field value
* Remove comments
* forgot to upgrade pyproject.toml
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
Stefano Fiorucci
891707ecaa
bug: handle Optional params in schema validation ( #2980 )
...
* not working draft
* first draft
* fix
* revert json schema
* better schema
* improvements, support different python versions
* little simplification
* improvements and more tests
* Revert "Merge branch 'handle_optional_params' into origin/main"
This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34.
* fix git mess
* handle optional params; schema
* test null values
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-24 10:40:19 +02:00
Ofek Lev
f6a4a14790
refactor: update package metadata ( #3079 )
...
* Update package metadata
* fix yaml
* remove Python version cap
* address review
2022-08-24 09:46:21 +02:00
Branden Chan
6d4031d8f6
Add OpenAI Answer Generator API ( #3050 )
...
* Add OpenAI Answer Generator API
* Regen tutorials
* Regen md files
* Incorporate reviewer feedback
* Incorporate reviewer feedback
* Incorporate reviewer feedback
* Incorporate reviewer feedback
2022-08-24 09:20:08 +02:00
Malte Pietsch
76af0444cc
feat: add progressbar to upload_files() of deepset Cloud client ( #3069 )
2022-08-23 20:51:08 +02:00
Sebastian
3ea57801ae
feat: Early stopping can be used in Reader and Retriever training ( #3071 )
...
* Add option to set early stopping in training
* Moved EarlyStopping to haystack/utils/early_stopping.py and added EarlyStopping to training Dense retrievers.
2022-08-23 14:18:12 +02:00
bogdankostic
b03de53716
Use random_sample instead of ndarray for random array ( #3083 )
2022-08-22 13:19:45 +02:00
Daniel Bichuetti
149224fe3a
fix: Crawler quits ChromeDriver on destruction ( #3070 )
...
* Close Chrome and Selenium WebDriver on destruction
* Fix failed pre-commit hook
2022-08-22 13:08:16 +02:00
Daniel Bichuetti
d715d0202d
fix: update ChromeDriver options on restricted environments and add ChromeDriver options as function parameter ( #3043 )
...
* Fix when env does nto exist
* Fix missed line
* Set conservative chromedriver options
* Set default options based on environment
* Fix removed line
* Updated documentation
* Generate new schemas manually
* Add arguments via iterator and helper function
* Pre-push doc format
* Use imported Option vs full namespace access
* Manually update schema
* Manually add documentation and schema
* Fix language and documentation
* Fix typo
* Auto generated docs
* Updated documentation
2022-08-22 12:59:33 +02:00
David G
e715dee17d
docs:fixed typo (or old documentation) in ipynb tutorial 3 ( #3033 )
...
* Update Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb
Just fixed the key in the document dictionary format so `write_documents()` won't raise an error. By the way the `write_documents()` error is really explicative
* Run convert_notebooks_into_webpages.py
Co-authored-by: David Gervasoni <david.gervasoni@trix.ai>
2022-08-22 12:56:30 +02:00
Massimiliano Pippi
97a8d30512
feat: Allow exact list matching with field in Elasticsearch filtering ( #2988 )
...
* ES filtering - allow exact list matching with field
typing fix
Update Documentation & Code Style
remove default hit limit in filtering queries
Update Documentation & Code Style
pytest es list eq filter
Update Documentation & Code Style
* review feedback
* fixed test
Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2022-08-22 12:42:37 +02:00
Daniel Bichuetti
d5e36ce6b4
fix(translator): write translated text to output documents, while keeping input untouched ( #3077 )
...
* Set translated text on a copy of original document
* Return new translated list
* Manually generated docs
TODO: check pre-commit
* Hook generated file
* Rename variables for better maintenance
* fix(translator): prevent inputs from being changed
* fix: manual update translator docs
* style(translator): explicit type declaration on List
* docs(translator): re-run pre-commit hook
* style(translator): ignore mypy wrong type check
* docs(translator): re-run pre-commit hook
2022-08-22 04:07:05 -04:00
Julian Risch
bc6f71b5ba
chore: increase version to next release candidate ( #3067 )
...
* increase version to next release candidate
* generate schema files
2022-08-19 14:49:50 +02:00
Julian Risch
eb0f0da0fd
Prepare 1.7.1 release ( #3061 )
...
* prepare 1.7.1 release
* Fix schemas
* Update haystack/json-schemas/haystack-pipeline-1.7.1.schema.json
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* change back main to master
* remove newline at end of file
* generate schema file with no newline
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
v1.7.1
2022-08-19 13:24:40 +02:00
Vladimir Blagojevic
be127e5b61
Trigger build failure Slack notify only on main repo (not forks) ( #3039 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-18 06:51:39 -04:00
Massimiliano Pippi
af24ffae55
feat: take the list of models to cache instead of hardcoding one ( #3060 )
...
* take the list of models to cache as an input
* let nltk find the cache dir on its own
2022-08-18 11:55:29 +02:00
tstadel
1027ab3624
Bump Version to 1.7.1rc ( #3041 )
...
* bump version to 1.7.1rc
* update openapi
2022-08-18 10:31:57 +02:00
James Briggs
82c9cff3d9
test: update filtering of Pinecone mock to imitate doc store ( #3020 )
...
* updated filtering of doc store to imitate pinecone
* Update test/mocks/pinecone.py
2022-08-18 09:57:08 +02:00
Sebastian
74b7c2c12a
Pin pyworld to <=0.2.12 ( #3047 )
2022-08-17 08:11:28 +02:00
Massimiliano Pippi
2328097ce0
rename the default branch name ( #3045 )
2022-08-16 20:24:58 +02:00
Tuana Celik
2298155a20
changing Slack to Discord ( #3040 )
...
* changing Slack to Discord
* Update README.md
* updating contributing
2022-08-15 15:56:16 +03:00
tstadel
baefd32b6f
Upgrade to v1.7.0 and copy docs folder ( #3014 )
...
* update version to 1.7.0
* copy docs
* update openapi
* generate schemas
* make update_json_schema() idempotent
* update docs, schema and openapi
v1.7.0
2022-08-15 14:20:30 +02:00
Julian Risch
d61755322f
chore: fix typo in API docs ( #3023 )
...
* chore: fix typo in API docs
* fix openapi
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-15 13:25:20 +02:00
tstadel
0aa0c68785
Fix broken MultiLabel serialization ( #3037 )
...
* Fix MultiLabel serialization
* update docs
* better comment
* remove unused imports
* remove unused imports (2)
2022-08-15 13:09:18 +02:00
Branden Chan
ff38a20863
docs: update File Classifier Docstring ( #3018 )
...
* Update docstring
* Trigger pre-commit hook
* Trigger pre-commit hook
* Incorporate reviewer feedback
* Incorporate reviewer feedback
2022-08-15 12:37:28 +02:00
Branden Chan
7312f99584
Update Summarizer Docs ( #3032 )
...
* Change text to content
* Change text to content
2022-08-15 12:35:41 +02:00