207 Commits

Author SHA1 Message Date
Stefano Fiorucci
4f261a4575
docs: extend tutorial14 about query classification (#3013)
* first draft for tutorial extension

* forgotten markdown

* improved tutorial

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* add markdown

* first draft for tutorial extension

* forgotten markdown

* improved tutorial

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* add markdown

* little corrections

* little corrections and add py tutorial

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update tutorials/Tutorial14_Query_Classifier.ipynb

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* update tutorial webpage

* fix typo

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
2022-08-12 17:59:47 +02:00
Bijay Gurung
717796c587
Tutorial 06: Replace DPR with EmbeddingRetriever (#2910)
* Tutorial 06: Replace DPR with EmbeddingRetriever

Closes #2887

* Add updated tutorials/6.md file

Replace `DensePassageRetriever` with `EmbeddingRetriever`

* Update Tutorial 06 based on PR feedback

* Further updates to Tutorial-06 according to review feedback

* [Tutorial 06] Put in review feedback for the py file
2022-08-03 18:43:54 +02:00
Julian Risch
3c81103db7
Remove logging config from Haystack (#2848)
* move logging config from haystack lib to application

* Update Documentation & Code Style

* config logging before importing haystack

* Update Documentation & Code Style

* add logging config to all tutorials

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-25 17:57:30 +02:00
Stefano Fiorucci
e350781825
Exclude docker from Tutorial 15 (#2861)
* modify notebook

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-21 10:01:25 +02:00
Vladimir Blagojevic
2a7e333d9a
Tutorial 12: add introduction (#2798)
* Tutorial 12: add introduction

* PR review for Tutorial 12: add introduction

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-13 17:44:19 +02:00
Sara Zan
091711b8c4
Fix Tutorials and Tutorials (nightly) (#2737)
* Remove caching and install audio deps

* Fix `Tutorials` as well

* Run all tutorials even though some fail

* Forgot fi

* fix failure condition

* proper bash string equality

* Enable debug logs

* remove audio files

* Update Documentation & Code Style

* Use the setup action in the Tutorial CI as well

* Try with a file that exists

* Update Documentation & Code Style

* Fix the comments in the tutorials

* Update Documentation & Code Style

* Fix tutorials.sh

* Remove debug logging

* import pprint and try editable install

* Update Documentation & Code Style

* extract no run list

* Add tutorial18 to no run list nightly

* import pprint correctly

* Update Documentation & Code Style

* try making site-packages editable

* Make pythonpath editable every time Tut17 is run on CI

* typo

* fix imports in tut5

* add git clean

* Update Documentation & Code Style

* add comments and remove` -e`

* accidentally deleted a line

* Update .github/utils/tutorials.sh

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-07-12 11:22:17 +02:00
Agnieszka Marzec
425da1fd31
Fix load_from_yaml example in the Pipelines tutorial (#2774)
* Fix load from yaml example and image

* Update Documentation & Code Style

* Fixed pipeline exmple

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-08 11:22:11 +02:00
Vladimir Blagojevic
a766b70a8f
Tutorial 18:Open in Colab doesn't work in Firefox (#2767)
* Tutorial 18:Open in Colab doesn't work in Firefox

* Tutorial 18:Open in Colab doesn't work in Firefox v2

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-06 10:51:09 -04:00
Vladimir Blagojevic
ffb7e4e4bd
GPL tutorial - add GPU header and open in colab button (#2736)
* GPL tutorial - add GPU header and open in colab button

* Add GPL tutorial to run exclusion list
2022-07-04 05:23:39 -04:00
Vladimir Blagojevic
b08c5f81d1
Add GPL adaptation tutorial (#2632)
* Add GPL adaptation tutorial

* Latest round of Aga's corrections

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-26 02:44:57 -04:00
Stefano Fiorucci
b01a7c2259
Add InMemoryKnowledgeGraph (#2678)
* draft for InMemoryKnowledgeGraph

* remove comments

* Update Documentation & Code Style

* fix import and signature

* Fix dependencies for in_memory_knowlede_graph

* updated tutorials

* Update Documentation & Code Style

* fix bug in notebook

* fix other notebook bug

* Update Documentation & Code Style

* improved tutorial notebook

* Update Documentation & Code Style

* better implementation of InMemoryKnowledgeGraph

* fix

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-22 19:16:33 +02:00
Rob Pasternak
b87c0c950b
Tutorial 14 edit (#2663)
* Rewrite Tutorial 14 for increased user-friendliness

* Update Tutorial14 .py file to match .ipynb file

* Update Documentation & Code Style

* unblock the ci

* ignore error in jitterbit/get-changed-files

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-06-22 13:03:07 +02:00
bogdankostic
b16430b61e
Tutorial 4: Set similarity to "cosine" in DocStore initialization (#2673)
* Set similarity to cosine in DocStore initialization

* Update Documentation & Code Style

* Set `scale_score` to `False`

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-20 18:47:09 +02:00
Sara Zan
584e046642
AnswerToSpeech (#2584)
* Add new audio answer primitives

* Add AnswerToSpeech

* Add dependency group

* Update Documentation & Code Style

* Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives

* Add tests

* Update Documentation & Code Style

* Add ability to compress audio and more tests

* Add audio group to test, all and all-gpu

* fix pylint

* Update Documentation & Code Style

* Accidental git tag

* Try pleasing mypy

* Update Documentation & Code Style

* fix pylint

* Add warning for missing OS library and support in CI

* Try fixing mypy

* Update Documentation & Code Style

* Add docs, simplify args for audio nodes and add tutorials

* Fix mypy

* Fix run_batch

* Feedback on tutorials

* fix mypy and pylint

* Fix mypy again

* Fix mypy yet again

* Fix the ci

* Fix dicts merge and install ffmpeg on CI

* Make the audio nodes import safe

* Trying to increase tolerance in audio test

* Fix import paths

* fix linter

* Update Documentation & Code Style

* Add audio libs in unit tests

* Update _text_to_speech.py

* Update answer_to_speech.py

* Use dedicated dataset & update telemetry

* Remove  and use distilled roberta

* Revert special primitives so that the nodes run in indexing

* Improve tutorials and fix smaller bugs

* Update Documentation & Code Style

* Fix serialization issue

* Update Documentation & Code Style

* Improve tutorial

* Update Documentation & Code Style

* Update _text_to_speech.py

* Minor lg updates

* Minor lg updates to tutorial

* Making indexing work in tutorials

* Update Documentation & Code Style

* Improve docstrings

* Try to use GPU when available

* Update Documentation & Code Style

* Fixi mypy and pylint

* Try to pass the device correctly

* Update Documentation & Code Style

* Use type of device

* use .cpu()

* Improve .ipynb

* update apt index to be able to download libsndfile1

* Fix SpeechDocument.from_dict()

* Change pip URL

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-06-15 10:13:18 +02:00
Sara Zan
735ffa635b
[CI refactoring] Tutorials on CI (#2547)
* Experimental Ci workflow for running tutorials

* Run on every push for now

* Not starting?

* Disabling paths temporarily

* Sort tutorials in natural order

* Install ipython

* remove ipython install

* Try running ipython with sudo

* env.pythonLocation

* Skipping tutorial2 and 9 for speed

* typo

* Use one runner per tutorial, for now

* Typo in dependend job

* Missing quotes broke scripts matrix

* Simplify setup for the tutorials, try to prevent containers conflict

* Remove needless job dependencies

* Try prevent cache issues, fix small Tut10 bug

* Missing deps for running notebook tutorials

* Create three groups of tutorials excluding the longest among them

* remove deps

* use proper bash loop

* Try with a single string

* Fix typo in echo

* Forgot do

* Typo

* Try to make the GraphDB tutorial without launching its own container

* Run notebook and script together

* Whitespace

* separate scrpits and notebooks execution

* Run notebooks first

* Try caching the GoT data before running the scripts

* add note

* fix mkdir

* Fix path

* Update Documentation & Code Style

* missing -r

* Fix folder numbering

* Run notebooks as well

* Typo in notebook command

* complete path in notebook command

* Try with TIKA_LOG_PATH

* Fix folder naming

* Do not use cached data in Tut9

* extracting the number better

* Small tweaks

* Same fix on Tut10 on the notebook

* Exclude GoT cache for tut5 too

* Remove faiss files after tutorial run

* Layout

* fix remove command

* Fix path in tut10 notebook

* Fix typo in node name in tut14

* Third block was too long, rebancing

* Reduce GoT dataset even more, why wasting time after all...

* Fix paths in tut10 again

* do git clean to make sure to cleanup everything (breaks post Python)

* Remove ES file with bad permission at the end of the run

* Split first block, takes >30mins

* take out tut15 for a moment, has an actual bug

* typo

* Forgot rm option

* Simply remove all ES files

* Improve logs of GoT reduction

* Exclude also tut16 from cache to try fix bug

* Replace ll with ls

* Reintroduce 15_TableQA

* Small regrouping

* regrouping to make the min num of runners go for about 30mins

* Add cron schedule and PR paths conditions

* Add some timing information

* Separate tutorials by diff and tutorials by cron

* temp add pull_request to tutorials nightly

* Add badge in README to keep track of the nightly tutorials run

* Remove prefixes from data folder names

* Add fetch depth to get diff with master

* Fix paths again

* typo

* Exclude long-running ones

* Typo

* Fix tutorials.yml as well

* Use head_ref

* Using an action for now

* exclude other files

* Use only the correct command to run the tutorial

* Add long running tutorials in separate runners, just for experiment

* Factor out the complex bash script

* Pass the python path to the bash script

* Fix paths

* adding log statement

* Missing dollarsign

* Resetting variable in loop

* using mini GoT dataset and improving bash script

* change dataset name

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-15 09:53:36 +02:00
tstadel
66c7d1a7ee
Add execute_eval_run example to Tutorial 5 (#2459)
* add execute_eval_run.ipynb

* update Tutorial 5

* Update Documentation & Code Style

* change experiment name

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-13 09:19:12 +02:00
Ryan Russell
c1b7948e10
Improve Docs Readability (#2617)
Signed-off-by: Ryan Russell <git@ryanrussell.org>
2022-06-03 09:57:40 +02:00
bogdankostic
61d9429c25
Simplify loading of EmbeddingRetriever (#2619)
* Infer model format for EmbeddingRetriever automatically

* Update Documentation & Code Style

* Adapt conftest to automatic inference of model_format

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Adapt tutorials

* Update Documentation & Code Style

* Add test for similarity scores with sentence transformers

* Adapt doc string and warning message

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-02 15:05:29 +02:00
Julian Risch
b2a2c10fae
Update milvus installation instructions to v2 (#2598) 2022-05-25 17:22:04 +02:00
tstadel
dd8dc588b1
fix eval with context matching in table qa use cases (#2597) 2022-05-25 16:26:29 +02:00
tstadel
7caca41c5d
Support context matching in pipeline.eval() (#2482)
* calculate context pred metrics

* Update Documentation & Code Style

* extend doc_relevance_col values

* fix import order

* Update Documentation & Code Style

* fix mypy

* fix typings literal import

* add option for custom document_id_field

* Update Documentation & Code Style

* fix tests and dataframe col-order

* Update Documentation & Code Style

* rename content to context in eval dataframe

* add backward compatibility to EvaluationResult.load()

* Update Documentation & Code Style

* add docstrings

* Update Documentation & Code Style

* support sas

* Update Documentation & Code Style

* add answer_scope param

* Update Documentation & Code Style

* rework doc_relevance_col and keep document_id col in case of custom_document_id_field

* Update Documentation & Code Style

* improve docstrings

* Update Documentation & Code Style

* rename document_relevance_criterion into document_scope

* Update Documentation & Code Style

* add document_scope and answer_scope to print_eval_report

* support all new features in execute_eval_run()

* fix imports

* fix mypy

* Update Documentation & Code Style

* rename pred_label_sas_grid into pred_label_matrix

* update dataframe schema and sorting

* Update Documentation & Code Style

* pass through context_matching params and extend document_scope test

* Update Documentation & Code Style

* add answer_scope tests

* fix context_matching_threshold for document metrics

* shorten dataframe apply calls

* Update Documentation & Code Style

* fix queries getting lost if nothing was retrieved

* Update Documentation & Code Style

* Update Documentation & Code Style

* use document_id scopes

* Update Documentation & Code Style

* fix answer_scope literal

* Update Documentation & Code Style

* update the docs (lg changes)

* Update Documentation & Code Style

* update tutorial 5

* Update Documentation & Code Style

* fix tests

* Add minor lg updates

* final docstring changes

* fix single quotes in docstrings

* Update Documentation & Code Style

* dataframe scopes added for each column

* better docstrings for context_matching params

* Update Documentation & Code Style

* fix summarizer eval test

* Update Documentation & Code Style

* fix test

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2022-05-24 18:11:52 +02:00
bogdankostic
867695ad0c
Change signature of queries param in batch methods (#2575)
* Change signature of queries param in batch methods

* Update Documentation & Code Style

* Fix mypy

* Remove unused import

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-24 12:33:45 +02:00
ClaMnc
2b11981b08
set top_k to 5 in SAS to be consistent (#2550)
* set top_k to 5 in SAS to be consistent

* set top_k to 5 in SAS to be consistent
2022-05-16 10:29:03 +02:00
Sara Zan
00aa1f41d7
convert_files_to_docs typo (#2546) 2022-05-13 16:38:43 +02:00
bogdankostic
738e008020
Add run_batch method to all nodes and Pipeline to allow batch querying (#2481)
* Add run_batch methods for batch querying

* Update Documentation & Code Style

* Fix mypy

* Update Documentation & Code Style

* Fix mypy

* Fix linter

* Fix tests

* Update Documentation & Code Style

* Fix tests

* Update Documentation & Code Style

* Fix mypy

* Fix rest api test

* Update Documentation & Code Style

* Add Doc strings

* Update Documentation & Code Style

* Add batch_size as attribute to nodes supporting batching

* Adapt error messages

* Adapt type of filters in retrievers

* Revert change about truncation_warning in summarizer

* Unify multiple_doc_lists tests

* Use smaller models in extractor tests

* Add return types to JoinAnswers and RouteDocuments

* Adapt return statements in reader's run_batch method

* Allow list of filters

* Adapt error messages

* Update Documentation & Code Style

* Fix tests

* Fix mypy

* Adapt print_questions

* Remove disabling warning about too many public methods

* Add flag for pylint to disable warning about too many public methods in pipelines/base.py and document_stores/base.py

* Add type check

* Update Documentation & Code Style

* Adapt tutorial 11

* Update Documentation & Code Style

* Add query_batch method for DCDocStore

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 11:11:00 +02:00
bogdankostic
5378a9ab48
Fix tutorials 4, 7 and 8 (#2526)
* Fix tutorials 4, 7 and 8

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-11 09:17:05 +02:00
MichelBartels
c7e39e5225
Replace TableTextRetriever with EmbeddingRetriever in Tutorial 15 (#2479)
* replace TableTextRetriever with EmbeddingRetriever in Tutorial 15

* Update Documentation & Code Style

* fix bug

* Update Documentation & Code Style

* update tutorial 15 outputs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-20-212.eu-west-1.compute.internal>
2022-05-05 10:12:44 +02:00
MichelBartels
5d98810a17
Raise error if torch-scatter is not installed or wrong version is installed (#2486)
* automatically download correct torch-scatter version

* raise error if torch-scatter is not installed

* Update Documentation & Code Style

* catch all import errors and fix linter

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:12:10 +02:00
Sara Zan
01ea4bf21f
Change default encoding for PDFToTextConverter from Latin 1 to UTF-8 (#2420)
* Change default encoding for PDFToTextConverter

* Update Documentation & Code Style

* Improve docstring

* Update Documentation & Code Style

* Add list of ligatures to ignore and add the possibility to modify such list at need

* Add docstring

* Add tests

* Rename parameter

* Update Documentation & Code Style

* Move implementation into the base converter to make mypy happier

* Update Documentation & Code Style

* mypy and pylint

* mypy

* move encoding parameter to init of PDFToTextConverter

* Update Documentation & Code Style

* make utf8 default and fix mypy

* Update Documentation & Code Style

* Update Documentation & Code Style

* remove note on encoding in tutorial8

* Update Documentation & Code Style

* skip OCRConverter and test converter.run

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-05-04 17:01:45 +02:00
Ahmed Nabil
9cdd719a6d
Update xpdfreader package installation (#2491)
This Update will fix this exception `Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite. ` Now, converting PDFs wouldn't have any issues.
2022-05-03 18:09:41 +02:00
Tuana Celik
b6e369d1ca
changing the name of the retrievers from es_retriever to retriever (#2487)
* changing the name of the retrievers from es_retriever to retriever

* Update Documentation & Code Style

* name fix 2

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-03 18:08:23 +02:00
Tuana Celik
d49e92e21c
ElasticsearchRetriever to BM25Retriever (#2423)
* change class names to bm25

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update Documentation & Code Style

* Add back all_terms_must_match

* fix syntax

* Update Documentation & Code Style

* Update Documentation & Code Style

* Creating a wrapper for old ES retriever with deprecated wrapper

* Update Documentation & Code Style

* New method for deprecating old ESRetriever

* New attempt for deprecating the ESRetriever

* Reverting to the simplest solution - warning logged

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-04-26 16:09:39 +02:00
Sara Zan
ba9c976bfe
Update pdftotext link (#2432)
* Update pdftotext link

* Update Documentation & Code Style

* Update Tutorial8_Preprocessing.ipynb

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 14:30:18 +02:00
Sebastian
3d42b70fbb
Added macos version of xpdf in tutorial 8 (#2424)
* Added macos version of xpdf in tutorial 8

* mini-error
2022-04-14 15:31:40 +02:00
Branden Chan
75dcfd3fab
Delete files in docs/_src (#2322)
* Delete files in _src

* Filter unused images and re-add images that were in use in docs/img

* Remove all usages of user-images.githubusercontent.com

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-04-12 16:19:03 +02:00
tstadel
8342a6c1d6
Fix eval discrepancies (#2381)
* fix eval discrepancies

* Update Documentation & Code Style

* fix reader eval comparison

* Update Documentation & Code Style

* slightly improve messed up top_n_f1 func

* add no_answer hint to reader.eval metrics

* fix tut5

* Update Documentation & Code Style

* correct doc_relevance_col in tests

* Update Documentation & Code Style

* redefine recall metrics for no_answers

* fix bugs in EvalAnswers

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-12 09:24:22 +02:00
MichelBartels
fc1cb63bcc
Fix RouteDocuments documentation (#2380)
* fix RouteDocuments documentation

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-31 11:45:02 +02:00
MichelBartels
eb514a6167
Add evaluation and document conversion to tutorial 15 (#2325)
* update tutorial 15 with newer features

* Update Documentation & Code Style

* fix tutorial 15

* update telemetry with tutorial changes

* Update Documentation & Code Style

* remove error output

* add output

* update non-notebook tutorial 15

* Update Documentation & Code Style

* delete distracting output from tutorial 15 notebook

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-29 17:09:05 +02:00
bogdankostic
834f8c4902
Change return types of indexing pipeline nodes (#2342)
* Change return types of file converters

* Change return types of preprocessor

* Change return types of crawler

* Adapt utils to functions to new return types

* Adapt __init__.py to new method names

* Prevent circular imports

* Update Documentation & Code Style

* Let DocStores' run method accept Documents

* Adapt tests to new return types

* Update Documentation & Code Style

* Put "# type: ignore" to right place

* Remove id_hash_keys property from Document primitive

* Update Documentation & Code Style

* Adapt tests to new return types and missing id_hash_keys property

* Fix mypy

* Fix mypy

* Adapt PDFToTextOCRConverter

* Remove id_hash_keys from RestAPI tests

* Update Documentation & Code Style

* Rename tests

* Remove redundant setting of content_type="text"

* Add DeprecationWarning

* Add id_hash_keys to elasticsearch_index_to_document_store

* Change document type from dict to Docuemnt in PreProcessor test

* Fix file path in Tutorial 5

* Remove added output in Tutorial 5

* Update Documentation & Code Style

* Fix file_paths in Tutorial 9 + fix gz files in fetch_archive_from_http

* Adapt tutorials to new return types

* Adapt tutorial 14 to new return types

* Update Documentation & Code Style

* Change assertions to HaystackErrors

* Import HaystackError correctly

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-29 13:53:35 +02:00
mkkuemmel
04b56f0b1c
Replace dpr with embeddingretriever tut14 (#2336)
* add updated graph images for tutorial14

* ipynb: replaced DPR with EmbeddingRetriever, added TODO for further inspection of failing code

* Revert "ipynb: replaced DPR with EmbeddingRetriever, added TODO for further inspection of failing code"

This reverts commit f4b6f3e1dbbedfd1bbe5e0e33645899dbea5d924.

* ipynb: replaced DPR with EmbeddingRetriever, added TODO for further inspection of failing code

* ipynb: quick fix to avoid failure in print_answers

* py: quick fix to avoid failure in print_answers

* Update Documentation & Code Style

* ipynb: remove DPR, remove images

* Revert "ipynb: remove DPR, remove images"

This reverts commit dfa1e7585da6743fcf97488405c356bf935a976d.

* ipynb: remove DPR, remove images

* py: replace DPR with EmbeddingRetriever

* Update Documentation & Code Style

* correcting a typo

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: TuanaCelik <tuana.celik@deepset.ai>
2022-03-28 16:54:49 +02:00
Raphaël Merx
4ebb71d42d
Fix link to squad_to_dpr.py in DPR train tutorial (#2334)
* Fix link to squad_to_dpr.py in DPR train tutorial

* update tutorial 9
2022-03-25 12:05:12 +01:00
Julian Risch
cec0137693
Change document attribute from text to content (#2352)
* Change document attribute from text to content

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-23 16:55:01 +03:00
Julian Risch
7ffeccece6
Fix tutorial dataset paths (#2340)
* fix tutorial 4 dataset path

* fix tutorial 8 dataset path

* fix tutorial 10 event

* Update Documentation & Code Style

* fix send event for tutorial 15

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-22 09:19:50 +01:00
Julian Risch
ac5617e757
Add basic telemetry features (#2314)
* add basic telemetry features

* change pipeline_config to _component_config

* Update Documentation & Code Style

* add super().__init__() calls to error classes

* make posthog mock work with python 3.7

* Update Documentation & Code Style

* update link to docs web page

* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)

* add comment on send_event in BaseComponent.init() and fix mypy

* mock NonPrivateParameters and fix pylint undefined-variable

* Update Documentation & Code Style

* check model path contains multiple /

* add test for writing to file

* add test for en-/disable telemetry

* Update Documentation & Code Style

* merge file deletion methods and ignore pylint global statement

* Update Documentation & Code Style

* set env variable in demo to activate telemetry

* fix mock of HAYSTACK_TELEMETRY_ENABLED

* fix mypy and linter

* add CI as env variable to execution contexts

* remove threading, add test for custom error event

* Update Documentation & Code Style

* simplify config/log file deletion

* add test for final event being sent

* force writing config file in test

* make test compatible with python 3.7

* switch to posthog production server

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 11:58:51 +01:00
mkkuemmel
06497da748
ipynb: inserted links to graph images (#2309)
* ipynb: inserted links to graph images

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-15 11:20:31 +01:00
mkkuemmel
a1040a17b2
Replace dpr with embeddingretriever tut11 (#2287)
* images for tutorial 11 in .github folder for easy access

* ipynb: changed DPR to EmbeddingRetriever, incl. new graphs of pipelines

* Update Documentation & Code Style

* moved images into correct folder

* removed images path

* Update Documentation & Code Style

* fixed debugging run of p_classifier

* Update Documentation & Code Style

* Revert debug param change

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2022-03-15 08:30:00 +01:00
Vladimir Blagojevic
6c0094b5ad
Update LFQA with the latest LFQA seq2seq and retriever models (#2210)
* Register BartEli5Converter for vblagoje/bart_lfqa model

* Update LFQA unit tests

* Update LFQA tutorials
2022-03-08 15:11:41 +01:00
tstadel
dde9d59271
fix pip backtracking issue (#2281)
* fix pip backtracking issue

* restrict azure-core version

* Remove the trailing comma

* Add skip_magic_trailing_comma in pyproject.toml for pydoc compatibility

* Pin pydoc-markdown _again_

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-07 19:25:33 +01:00
mkkuemmel
5951fc463e
Replace dpr with embeddingretriever tut5 (#2274)
* ipynb: EmbeddingRetriever made more prominent than DPR

* ipynb: EmbeddingRetriever more prominent than DPR

* Update Documentation & Code Style

* indentation fix

* Update Documentation & Code Style

* py: EmbeddingRetriever more prominent than DPR

* indentation fix

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-04 11:29:48 +01:00
bogdankostic
c5542bd3fb
Add RouteDocuments and JoinAnswers nodes (#2256)
* Add SplitDocumentList and JoinAnswer nodes

* Update Documentation & Code Style

* Add tests + adapt tutorial

* Update Documentation & Code Style

* Remove branch from installation path in Tutorial

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* Change name of SplitDocumentList to RouteDocuments

* Update Documentation & Code Style

* Adapt tutorials to new name

* Add test for JoinAnswers

* Update Documentation & Code Style

* Adapt name of test for JoinAnswers node

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-01 17:42:11 +01:00