2539 Commits

Author SHA1 Message Date
Julian Risch
b685409c78
chore: add topic tags to auto generation of release notes (#3008) 2022-08-09 17:12:42 +02:00
bogdankostic
5c3bfad078
feat: Add page number to Documents coming from PDFConverters and PreProcessor (#2932)
* Add page number to Documents coming from PDFConverters and PreProcessor

* Fix mypy

* Update API Docs

* Update API Docs

* Remove unused imports

* Generate JSON schema

* Generate JSON schema

* Make test variable shorter

* Make regex a separate function

* Move counting of page breaks to a function

* Generate JSON schema

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update API Documentation

* Don't create instance for testing staticmethod

* Update haystack/nodes/preprocessor/preprocessor.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-09 15:55:27 +02:00
Stefano Fiorucci
09707b576a
Make MultiLabel preserve order (#2956)
* try simple approach

* added test

* add requested test
2022-08-09 15:53:24 +02:00
Branden Chan
dfeb171686
Add API page for util functions (#2863)
* Clean OpenAIAnswerGenerator docstrings

* Incorporate reviewer feedback

* Update Documentation & Code Style

* Improve id_hash_keys description

* Simplify id_hash_keys description

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-08-09 14:53:45 +02:00
Vladimir Blagojevic
50f7d660e2
Add slack hook for test failures (#2996) 2022-08-09 08:27:52 -04:00
Massimiliano Pippi
862ac31b5c
bump streamlit version (#3002) 2022-08-09 10:52:41 +02:00
Stefano Fiorucci
4a63484916
feat: Extend TransformersQueryClassifier: clean version (#2965)
* extend query classifier in one commit

* variable number of outgoing edges

* improve tests

* fix unused import

* lightweight approach

* fix _calculate_outgoing_edges

* remove duplicate label validation

* Remove print
2022-08-09 09:43:33 +02:00
MichelBartels
c91316e862
feat: add gradient accumulation in FARMReader (#2925)
* expose gradient accumulation to train function of FARMReader

* add documentation for gradient accumulation

* Update Documentation & Code Style

* doc string improvements

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* doc string improvements

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* doc string improvements

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-08 18:42:21 +02:00
Sara Zan
82448efa4f
feat: warn users if they're calling get_all_labels on a document index and vice-versa (Elasticsearch & Opensearch only) (#2990)
* Add fix to ES

* Update haystack/document_stores/elasticsearch.py
2022-08-08 16:50:42 +02:00
Vladimir Blagojevic
d1f8b7118c
Add progress bar to batch run component ops (#2864)
* Add progress bar to batch run component ops

* Update docs

* Update schema

* PR review: thanks Bogdan
2022-08-08 09:32:44 -04:00
Massimiliano Pippi
0e8efdafa9
Add enhanced pydoc-markdown pre-hook (#2979)
* add pydoc-markdown pre-hook

* add more comments, remove debug prints
2022-08-08 12:41:21 +02:00
Sara Zan
1a0a4c8836
Remove pipes from code block (#2973)
* Remove pipes

* Generate md
2022-08-05 19:18:57 +02:00
James Briggs
4ba2444652
Update CONTRIBUTING.md (#2975) 2022-08-05 19:00:18 +02:00
Tobias Wochinger
065173fe5e
chore: add PR template (#2883)
* chore: add PR template

* ci: update PR template after latest discussions in Notion

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* Apply suggestions from code review

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update .github/pull_request_template.md

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* docs: re-order and add link

* docs: add new conventions to contributor guidelines

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-08-05 18:14:18 +02:00
Vladimir Blagojevic
4f8d11c591
Update Seq2SeqGenerator API documentation (#2970)
* Seq2SeqGenerator - update API docs
2022-08-05 17:39:23 +02:00
Sebastian
88cab19bd9
Remove unused variable (#2974) 2022-08-05 16:41:11 +02:00
Vladimir Blagojevic
762a12fcb1
Print eval reports improvements (#2941) 2022-08-04 11:21:27 -04:00
Sebastian
1b86b715b3
Better check for "DebertaV2" architecture in Trainer.train (#2966)
* Update haystack/modeling/training/base.py to better check for "DebertaV2" architecture

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-04 16:30:06 +02:00
Bilge Yücel
489699bd98
Fix docs code format for sentence transformers (#2957)
Co-authored-by: bilge4 <bilge@techwolf.ai>
2022-08-04 12:31:42 +02:00
Vladimir Blagojevic
368828fd4a
Component batch_size should be defined rather than Optional (#2958)
* Ensure batch_size for components is defined rather than Optional

* PR review - update schema
2022-08-04 12:20:28 +02:00
Vladimir Blagojevic
515a85d633
Update contributing guide, clarify when '.[all]' install is needed (#2961) 2022-08-04 12:20:07 +02:00
tstadel
b042dd9c82
Fix validation for dynamic outgoing edges (#2850)
* fix validation for dynamic outgoing edges

* Update Documentation & Code Style

* use class outgoing_edges as fallback if no instance is provided

* implement classmethod approach

* readd comment

* fix mypy

* fix tests

* set outgoing_edges for all components

* set outgoing_edges for mocks too

* set document store outgoing_edges to 1

* set last missing outgoing_edges

* enforce BaseComponent subclasses to define outgoing_edges

* override _calculate_outgoing_edges for FileTypeClassifier

* remove superfluous test

* set rest_api's custom component's outgoing_edges

* Update docstring

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* remove unnecessary else

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-04 10:27:50 +02:00
Massimiliano Pippi
40d07c2038
Enable Opensearch unit tests in Windows CI (#2936)
* enable Opensearch unit tests under Win

* move unit tests into a dedicated job

* skip audio tests on missing dependencies

* avoid failing test collection when soundfile is not available

* Update .github/workflows/tests.yml

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-03 19:19:07 +02:00
Francesco Castelli
1b238c880b
Generalize <sep>, <pad> and </s> tokens of QuestionGenerator node (#2769)
* fixed tokens in question generation

* simplified assignment

* same behavior also for pad and eos

* use skip_special_tokens in batch_decode

* fixed black error and update docs

* fixed schemas ci error

* JSON schemas

* Add git diff to debug schema issues

* opensearch schema was missing

* Add missing instruction in the workflow error message

* typo
2022-08-03 18:51:34 +02:00
Zoltan Fedor
1e20818328
Ability to run Ray Serve detached (#2945)
* Ability to run Ray Serve detached

Fixes #2944

Ability to run Ray Serve detached - to allow running multiple instances of the app (HA).

See https://docs.ray.io/en/latest/serve/package-ref.html#core-apis

* Generating the docs

* Re-trigger the CI pipeline

* Retrigger the CI Pipeline

* Typo in docstrings

* Fixing docstring and typing issues

* Regenerating docs

* [EMPTY] Re-trigger CI

* [EMPTY] Re-trigger CI

* Refactoring to allow any number of args for the `serve.start()` method

There seems to be additional arguments of the `serve.start()` method, so we should probably cover all of them at once, instead of only the `detached` option.

* [EMPTY] Re-trigger CI

* Test whether the ServeControllerClient in fact has the supplied `detached` parameter
2022-08-03 18:49:03 +02:00
Bijay Gurung
717796c587
Tutorial 06: Replace DPR with EmbeddingRetriever (#2910)
* Tutorial 06: Replace DPR with EmbeddingRetriever

Closes #2887

* Add updated tutorials/6.md file

Replace `DensePassageRetriever` with `EmbeddingRetriever`

* Update Tutorial 06 based on PR feedback

* Further updates to Tutorial-06 according to review feedback

* [Tutorial 06] Put in review feedback for the py file
2022-08-03 18:43:54 +02:00
Massimiliano Pippi
3728a95de6
fix docker tag for cuda (#2952) 2022-08-03 17:59:46 +02:00
Zoltan Fedor
7b97bbbff0
Extending the Ray Serve integration to allow attributes for Serve deployments (#2918)
* Extending the Ray Serve integration to allow attributes for Serve deployments

This closes #2917

We should be able to set Ray Serve attributes for the nodes of pipelines, like amount of GPU to use, max_concurrent_queries, etc.

Now this is possible from the pipeline yaml file for each node of the pipeline.

* Ran black and regenerated the json schemas

* Fixing the JSON Schema generation

* Trying to fix the schema CI test issue

* Fixing the test and the schemas

Python 3.8 was generating a different schema than Python 3.7 is creating in the CI. You MUST use Python 3.7 to generate the schemas, otherwise the CIs will fail.

* Merge the two Ray pipeline test cases

* Generate the JSON schemas again after `$ pip install .[all]`

* Removing `haystack/json-schemas/haystack-pipeline-1.16.schema.json`

This was generated by the JSON generator, but based on @ZanSara's instructions, I am removing it.

* Making changes based on @ZanSara's request - the newly requested test is failing

* Fixing the JSON schema generation again

* Renaming `replicas` and moving it under `serve_deployment_kwargs`

* add extras validation, untested

* Dcoumentation update

* Black

* [EMPTY] Re-trigger CI

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-08-03 16:38:22 +02:00
Sara Zan
669f6f0128
Add git diff to schema checks (#2959) 2022-08-03 09:46:38 -04:00
Massimiliano Pippi
e766bb8684
add code owners (#2950)
* add code owners

* add tutorials folder
2022-08-03 10:48:30 +02:00
Sebastian
bde3261b07
Update minimum selenium version supported for crawler (#2921)
* Update minimum requirement for selenium for using the crawler

* Updating pin of grpcio to match default in google colab

* Adding requests requirement
2022-08-03 10:11:18 +02:00
tstadel
2c56305ed3
Fix serialization of numpy arrays and pandas dataframes in REST API (#2838)
* correct serialization of numpy arrays and pandas dataframes

* Update Documentation & Code Style

* set additional json_encoders globally

* Update Documentation & Code Style

* add tests for non primitive return types

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-08-02 09:49:32 +02:00
Vladimir Blagojevic
86d56b4dfe
Add HF model caching for integration tests (#2909)
* Add HF model caching for integration tests

* Remove windows mode caching - not worth it
2022-07-29 18:17:05 +02:00
Steven Haley
6b7d4a0514
Bug fix Weaviate document deletion (#2899)
* Bug fix Weaviate document deletion

If no filters param is passed in, then the original code retrieves *all* documents before then deleting by their IDs. There's no need for that, since we can delete by their IDs directly.

* Edit comment to clarify deletion and recreation

* Write unit tests for bug fix
2022-07-29 17:21:25 +02:00
Sara Zan
434b1c3682
Disable a few checks in the pre-commit hook (#2929)
* Disable small checks giving trouble to pydoc-markdown and JSON Schema

* Add instructions for JSON schema generator in the workflow logs
2022-07-29 17:02:56 +02:00
Sara Zan
3157e20dff
Change black pre-commit hook into black-jupyter (#2928)
* change black into black-jupyter

* Revert tutorial changes

This reverts commit dd3c5d954d6a9eed41b849e6a3d14269019bf21b.

* finalize pre-commit changes
2022-07-29 15:56:22 +02:00
Sara Zan
284c759346
Add switch for BiAdaptive and TriAdaptiveModel in Evaluator (#2908)
* Add switch for BiAdaptive and Triadaptive Model

* fix import

* black

* padding -> attention
2022-07-29 11:31:52 +02:00
GianiStatie
b78db1cbaf
Use batch_size in QuestionGenerator (#2870)
* Bugfix: batch_size was not passed to self.generate_batch

* Testing pre-push hooks

* Formatting code using black

* Adding black changes

* Adding black changes

* Adding black changes

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-07-29 09:41:34 +02:00
Vladimir Blagojevic
1f5b9bd69b
Explicitly specify all parameters to forward call (#2886)
* Explicitly specify all parameters to forward call

* Use DPREncoder instead of get_language_model in dense retriever

* Black formatting
2022-07-28 13:43:12 +02:00
Sara Zan
330a1c0249
Wrap opensearch imports into safe_import (#2907)
* Wrap opensearch imports into `safe_import`

* black
2022-07-28 12:25:31 +02:00
Massimiliano Pippi
e7627c3f8b
Use opensearch-py in OpenSearchDocumentStore (#2691)
* add Opensearch extras

* let OpenSearchDocumentStore use opensearch-py

* Update Documentation & Code Style

* fix a bug found after adding tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-07-28 10:04:49 +02:00
Steven Haley
ae84c5a533
Fix typos in Contributing.md (#2897) 2022-07-28 09:30:25 +02:00
Daniel Bichuetti
1162daa7f3
Pin pyworld dependency to 0.2.12 (#2900) 2022-07-27 19:42:26 +02:00
Sara Zan
b2bd99d799
Recommend installing pre-commit hook on commit (#2890)
* recomment installing hook on commit

* remove change in weaviate docker command
2022-07-27 18:37:35 +02:00
Daniel Fleischer
d91a5b0e15
Typo README.md (#2895) 2022-07-27 16:00:50 +02:00
Zoltan Fedor
adb2b2c312
Add support for BM25 with the Weaviate document store (#2860)
* Upgrading Weaviate used for testing to 1.14.1 from 1.11.0

This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue.

* Weaviate client upgrade

From v3.3.3 to v3.6.0

* Adding BM25 Retrieval to Weaviate

Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters).
This commit adds support for inverted index (BM25) querying against Weaviate.

* Running Black on the recent code changes

* Update Documentation & Code Style

* Fixing linting issues after code changes by black

* The BM25 query needs to be in all lowercase for now

The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate.
See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119

* Fixing method parameter docstring to highlight that they are not supported in Weaviate

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-27 10:07:13 +02:00
Sebastian
128d1e2388
Updating pre-commit-config to remove python version (#2884) 2022-07-26 18:14:34 +02:00
Sara Zan
2d65c380f1
pre-commit hooks (#2819)
* Add pre-commit config

* update contributing guidelines

* try failing the workflow

* add pre-commit to the deps

* updating uninstall instructions

* separate jobs in CI

* make tutorials check fail

* make black check fail

* make openapi check fail

* make yaml schema and api docs checks fail

* highlight the instructions

* Update .pre-commit-config.yaml

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Use black --check

* Add images of the CI

* title level

* feedback

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
2022-07-26 15:02:15 +02:00
Julian Risch
3c81103db7
Remove logging config from Haystack (#2848)
* move logging config from haystack lib to application

* Update Documentation & Code Style

* config logging before importing haystack

* Update Documentation & Code Style

* add logging config to all tutorials

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-25 17:57:30 +02:00
Sara Zan
5d8476eb58
Restart containers in tutorials.sh (#2858)
* restart tutorials in the loop

* remove container steps in tutorials.yml

* forgotten quotes

* unmatched bracket

* give names to containers

* try to limit the log size

* make the containers restart on the scripts as well

* feedback

* Raise integration tests timeout

* raising limit again
2022-07-25 17:35:36 +02:00