3803 Commits

Author SHA1 Message Date
Sara Zan
284c759346
Add switch for BiAdaptive and TriAdaptiveModel in Evaluator (#2908)
* Add switch for BiAdaptive and Triadaptive Model

* fix import

* black

* padding -> attention
2022-07-29 11:31:52 +02:00
GianiStatie
b78db1cbaf
Use batch_size in QuestionGenerator (#2870)
* Bugfix: batch_size was not passed to self.generate_batch

* Testing pre-push hooks

* Formatting code using black

* Adding black changes

* Adding black changes

* Adding black changes

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-07-29 09:41:34 +02:00
Vladimir Blagojevic
1f5b9bd69b
Explicitly specify all parameters to forward call (#2886)
* Explicitly specify all parameters to forward call

* Use DPREncoder instead of get_language_model in dense retriever

* Black formatting
2022-07-28 13:43:12 +02:00
Sara Zan
330a1c0249
Wrap opensearch imports into safe_import (#2907)
* Wrap opensearch imports into `safe_import`

* black
2022-07-28 12:25:31 +02:00
Massimiliano Pippi
e7627c3f8b
Use opensearch-py in OpenSearchDocumentStore (#2691)
* add Opensearch extras

* let OpenSearchDocumentStore use opensearch-py

* Update Documentation & Code Style

* fix a bug found after adding tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-07-28 10:04:49 +02:00
Steven Haley
ae84c5a533
Fix typos in Contributing.md (#2897) 2022-07-28 09:30:25 +02:00
Daniel Bichuetti
1162daa7f3
Pin pyworld dependency to 0.2.12 (#2900) 2022-07-27 19:42:26 +02:00
Sara Zan
b2bd99d799
Recommend installing pre-commit hook on commit (#2890)
* recomment installing hook on commit

* remove change in weaviate docker command
2022-07-27 18:37:35 +02:00
Daniel Fleischer
d91a5b0e15
Typo README.md (#2895) 2022-07-27 16:00:50 +02:00
Zoltan Fedor
adb2b2c312
Add support for BM25 with the Weaviate document store (#2860)
* Upgrading Weaviate used for testing to 1.14.1 from 1.11.0

This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue.

* Weaviate client upgrade

From v3.3.3 to v3.6.0

* Adding BM25 Retrieval to Weaviate

Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters).
This commit adds support for inverted index (BM25) querying against Weaviate.

* Running Black on the recent code changes

* Update Documentation & Code Style

* Fixing linting issues after code changes by black

* The BM25 query needs to be in all lowercase for now

The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate.
See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119

* Fixing method parameter docstring to highlight that they are not supported in Weaviate

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-27 10:07:13 +02:00
Sebastian
128d1e2388
Updating pre-commit-config to remove python version (#2884) 2022-07-26 18:14:34 +02:00
Sara Zan
2d65c380f1
pre-commit hooks (#2819)
* Add pre-commit config

* update contributing guidelines

* try failing the workflow

* add pre-commit to the deps

* updating uninstall instructions

* separate jobs in CI

* make tutorials check fail

* make black check fail

* make openapi check fail

* make yaml schema and api docs checks fail

* highlight the instructions

* Update .pre-commit-config.yaml

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Use black --check

* Add images of the CI

* title level

* feedback

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
2022-07-26 15:02:15 +02:00
Julian Risch
3c81103db7
Remove logging config from Haystack (#2848)
* move logging config from haystack lib to application

* Update Documentation & Code Style

* config logging before importing haystack

* Update Documentation & Code Style

* add logging config to all tutorials

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-25 17:57:30 +02:00
Sara Zan
5d8476eb58
Restart containers in tutorials.sh (#2858)
* restart tutorials in the loop

* remove container steps in tutorials.yml

* forgotten quotes

* unmatched bracket

* give names to containers

* try to limit the log size

* make the containers restart on the scripts as well

* feedback

* Raise integration tests timeout

* raising limit again
2022-07-25 17:35:36 +02:00
Stefano Fiorucci
7dcef68685
Handle invalid metadata for SQLDocumentStore (#2868)
* modify notebook

* skip invalid metadata

* Update Documentation & Code Style

* fix nonetype

* fix nonetype

* drop nonetype from valid types

* drop nonetype from valid types

* fix

* Update sql.py

* sqlalchemy validation

* removed newlines

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-25 14:57:21 +02:00
Sara Zan
5119acb260
Raise timeout on integration tests (#2880) 2022-07-25 06:43:20 -04:00
Sara Zan
4e45062a00
Simplify language_modeling.py and tokenization.py (#2703)
* Simplification of language_model.py and tokenization.py to remove code duplication

Co-authored-by: vblagoje <dovlex@gmail.com>
2022-07-22 16:29:30 +02:00
Massimiliano Pippi
8ee2b6b403
Add a custom pydoc renderer for Readme.io (#2825)
* add custom pydoc renderer

* create an example

* revert example code
2022-07-22 10:43:51 +02:00
tstadel
11c46006df
Fix corrupted csv from EvaluationResult.save() (#2854)
* fix corrupted csv if text contains \r chars; make csv serialization configurable

* Update Documentation & Code Style

* incorporate feedback

* Update Documentation & Code Style

* adjust columns to be converted during loading

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-21 16:31:07 +02:00
Stefano Fiorucci
e350781825
Exclude docker from Tutorial 15 (#2861)
* modify notebook

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-21 10:01:25 +02:00
Daniel Bichuetti
3948b997b2
Add support for custom trained PunktTokenizer in PreProcessor (#2783)
* Add support for model folder into BasePreProcessor

* First draft of custom model on PreProcessor

* Update Documentation & Code Style

* Update tests to support custom models

* Update Documentation & Code Style

* Test for wrong models in custom folder

* Default to ISO names on custom model folder

Use long names only when needed

* Update Documentation & Code Style

* Refactoring language names usage

* Update fallback logic

* Check unpickling error

* Updated tests using parametrize

Co-authored-by:  Sara Zan <sara.zanzottera@deepset.ai>

* Refactored common logic

* Add format control to NLTK load

* Tests improvements

Add a sample for specialized model

* Update Documentation & Code Style

* Minor log text update

* Log model format exception details

* Change pickle protocol version to 4 for 3.7 compat

* Removed unnecessary model folder parameter

Changed logic comparisons

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* Update Documentation & Code Style

* Removed unused import

* Change errors with warnings

* Change to absolute path

* Rename sentence tokenizer method

Co-authored-by: tstadel

* Check document content is a string before process

* Change to log errors and not warnings

* Update Documentation & Code Style

* Improve split sentences method

Co-authored-by:  Sara Zan  <sara.zanzottera@deepset.ai>

* Update Documentation & Code Style

* Empty commit - trigger workflow

* Remove superfluous parameters

Co-authored-by: tstadel

* Explicit None checking

Co-authored-by: tstadel

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-07-21 09:50:45 +02:00
Kristof Herrmann
f51587b4ad
🐛 fix: update deployment status codes (#2713)
* 🐛 fix: update deployment status codes

* Update Documentation & Code Style

* adjust error log

* added tests for failed state

* added valid initial states

* fix

* fix tests

* add test

* updated comments

* uncommented code again

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-07-21 09:04:45 +02:00
Stefano Fiorucci
de6b9c3d3e
Remove deprecated method prepare_seq2seq_batch (#2852)
* Remove deprecated method prepare_seq2seq_batch
2022-07-20 16:49:54 +02:00
James Briggs
a4e197c21a
changed mock pinecone to use dict rather than list index (#2845) 2022-07-19 15:28:22 +02:00
kekayan
925eeddf0a
remove unnecessary if else block #2835 (#2842) 2022-07-19 15:25:40 +02:00
Stefano Fiorucci
baf5ef81f7
Validate OpenAI response (#2844)
* openai response check

* Update Documentation & Code Style

* Update haystack/nodes/answer_generator/openai.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update Documentation & Code Style

* correct indentation

* add OpenAIError

* raise OpenAIError

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-07-19 11:54:50 +02:00
tstadel
9ad90b2e23
fix healtcheck cmds for annotation tool postgres (#2840) 2022-07-18 18:31:22 +02:00
Sara Zan
48644b23fb
Enable CI on tutorials (#2801)
* enable ci on tutorials

* Disable all path restrictions for safety

* actually comment out the paths block

* remove comment
2022-07-18 17:59:55 +02:00
Massimiliano Pippi
632cd1c141
Allow values that are not dictionaries in the request params in the /search endpoint (#2720)
* let params contain something else than dictionaries

* rewrite the test same style as the main branch
2022-07-15 13:24:29 +02:00
Sara Zan
6b39fbd39c
Mocking Pinecone tests (#2778)
* Integrating the mock into conftest.py

* re-enable workflow

* delete_all

* Update Documentation & Code Style

* remove ValueError

* Add empty response

* wrong condition

* return response

* revert removal of delete_all

* change mock

* Update Documentation & Code Style

* test for rest api, to revert

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-14 20:03:33 +02:00
tstadel
e6d8bcdf9b
Fix gold_contexts_similarity for table retrieval evaluation (#2815)
* fix gold_contexts_similarity for table documents

* check for type of gold_context
2022-07-14 17:59:20 +02:00
Massimiliano Pippi
82df677ebf
API tests (#2738)
* clean up tests and run earlier

* use change detection

* better naming, skip ES

* more cleanup

* fix job name

* dummy commit to trigger the CI

* mock away the PDF converter

* make the test compatible with 3.7

* removed leftover

* always run the api tests, use a matrix for the OS

* refactor all the tests

* remove outdated dependency

* pylint

* new abstract method

* adjust for older python versions

* rename pipeline file

* address PR comments
2022-07-14 15:36:28 +02:00
Branden Chan
0388284d71
Clean OpenAIAnswerGenerator docstrings (#2797)
* Clean OpenAIAnswerGenerator docstrings

* Incorporate reviewer feedback

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-07-14 09:35:30 +02:00
Vladimir Blagojevic
2a7e333d9a
Tutorial 12: add introduction (#2798)
* Tutorial 12: add introduction

* PR review for Tutorial 12: add introduction

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-13 17:44:19 +02:00
Julian Risch
f599ce9458
Change "text" to "content" as dict key (#2800)
* change "text" to "content" as dict key

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-13 16:36:06 +02:00
Sara Zan
d8e7aaeacc
API key check in OpenAIAnswerGenerator (#2791)
* api key check in node and tests

* Clarify skip message

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-12 14:05:47 +02:00
Sara Zan
4d2a06989d
Fix YAML validation for ElasticsearchDocumentStore.custom_query (#2789)
* Add exception for  in the validation code

* Update Documentation & Code Style

* Add tests

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-12 13:49:06 +02:00
Sara Zan
091711b8c4
Fix Tutorials and Tutorials (nightly) (#2737)
* Remove caching and install audio deps

* Fix `Tutorials` as well

* Run all tutorials even though some fail

* Forgot fi

* fix failure condition

* proper bash string equality

* Enable debug logs

* remove audio files

* Update Documentation & Code Style

* Use the setup action in the Tutorial CI as well

* Try with a file that exists

* Update Documentation & Code Style

* Fix the comments in the tutorials

* Update Documentation & Code Style

* Fix tutorials.sh

* Remove debug logging

* import pprint and try editable install

* Update Documentation & Code Style

* extract no run list

* Add tutorial18 to no run list nightly

* import pprint correctly

* Update Documentation & Code Style

* try making site-packages editable

* Make pythonpath editable every time Tut17 is run on CI

* typo

* fix imports in tut5

* add git clean

* Update Documentation & Code Style

* add comments and remove` -e`

* accidentally deleted a line

* Update .github/utils/tutorials.sh

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-07-12 11:22:17 +02:00
Sowmiya Jaganathan
4d8f40425b
Passing the meta-data in the summerizer response (#2179)
* Passing the all the meta-data in the summerizer

* Disable metadata forwarding if `generate_single_summary` is `True`

* Update Documentation & Code Style

* simplify tests

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-11 17:28:36 +02:00
Daniel Augustus Bichuetti Silva
1706729e26
Prevent PDFToTextConverter from failing on PDFs with spaces in their names (#2786)
* Change split logic to list

* Fix wrong parameter for run

* Fix mypy error

* Fix layout/raw parameter

* Add test for filename with whitespaces on PDFToText

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-11 13:30:33 +02:00
Daniel Augustus Bichuetti Silva
77a513fe49
Fix crawler long file names (#2723)
* Changing the name that crawled page is saved to avoid long file names error on some file systems

* Custom naming function for saving crawled files

* Update Documentation & Code Style

* Remove bad characters on file name and preffix

* Add test for naming function

* Update Documentation & Code Style

* Fix expensive regex recalculation and linter warns

* Check for exceptions on file dump

* Remove param_naming variable

* Fix file paths on Windows, Linux and Mac

* Update Documentation & Code Style

* Test using one of the docstrings examples

* Change default naming function
Update docstrings

* Applying formatting rules

* Update Documentation & Code Style

* Fix mypy incompatible assignment error

* Remove unused type declaration

* Fix typo

* Update tests for naming function

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-11 12:16:32 +02:00
Malte Pietsch
ba08fc86f5
Add node to use OpenAI's GPT-3 for QA (#2605)
* first draft of openai node for QA

* Update Documentation & Code Style

* fix mypy. add node to inits

* Update Documentation & Code Style

* fix linter

* Adapt OpenAIGenerator to completions endpoint

* Update Documentation & Code Style

* Fix pylint

* Fix doc strings

* Make use of temperature

* Make use of api key in tests

* Adapt doc strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-07-08 13:59:27 +02:00
Agnieszka Marzec
425da1fd31
Fix load_from_yaml example in the Pipelines tutorial (#2774)
* Fix load from yaml example and image

* Update Documentation & Code Style

* Fixed pipeline exmple

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-08 11:22:11 +02:00
James Briggs
ea40387b97
added mock pinecone client (#2770) 2022-07-07 19:51:30 +02:00
tstadel
d21b066fc7
fix pipeline run loop on joined pipelines whithout debug flag (#2777)
* fix pipeline run loop on joined pipelines whithout debug flag

* use .keys() consistently
2022-07-07 16:47:59 +02:00
bogdankostic
195aed942f
Add update_document_meta to InMemoryDocumentStore (#2689)
* Add update_document_meta to InMemoryDocumentStore

* Fix typo

* Update Documentation & Code Style

* Add update_document_meta to BaseDocumentStore

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Add update_document_meta to MockDocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-07 15:44:07 +02:00
tstadel
45136badfe
Fix _debug info getting lost for previous nodes when using join nodes (#2776)
* fix debug output for pipelines with join nodes

* add test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-07 15:10:13 +02:00
Vladimir Blagojevic
a766b70a8f
Tutorial 18:Open in Colab doesn't work in Firefox (#2767)
* Tutorial 18:Open in Colab doesn't work in Firefox

* Tutorial 18:Open in Colab doesn't work in Firefox v2

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-06 10:51:09 -04:00
Tuana Celik
917afb1530
Trying out some smaller images for docs (#2772) 2022-07-06 16:11:23 +02:00
tstadel
e9219f4dc2
Fix confusing elasticsearch exception (#2763)
* convert confusing exception to warning and add no docs case.

* blacken

* fix test
2022-07-06 15:40:51 +02:00