1461 Commits

Author SHA1 Message Date
Sara Zan
7167a26483
Small fixes to the public demo (#1781)
* Make strealit tolerant to haystack not knowing its version, and adding special error for docstore issues

* Add workaround for a Streamlit bug

* Make default filters value an empty dict

* Return more context for each answer in the rest api

* Make the hs_version call not-blocking by adding a very quick timeout

* Add disclaimer on low confidence answer

* Use the no-answer feature of the reader to highlight questions with no good answer
2021-11-22 19:06:08 +01:00
Julian Risch
9211c4c64d
initialize doc store with doc and label index in tutorial 5 (#1730)
* initialize doc store with doc and label index

* change ipynb according to py for tutorial 5

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-22 15:18:02 +01:00
Julian Risch
845905e418
ignore empty filters parameter (#1783)
* ignore empty filters parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-22 09:36:14 +01:00
Kristof Herrmann
a8c2cdc565
private hugging face models for retrievers (#1785)
* private dpr

* Add latest docstring and tutorial changes

* added parameters to child functions

* Add latest docstring and tutorial changes

* added tableextractor

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-22 09:24:02 +01:00
Kristof Herrmann
8aa4ca29c2
Huggingface private model support via API tokens (FARMReader) (#1775)
* passed kwargs to model loading

* Pass Auth token explicitly

* add use_auth_token to get_language_model_class

* added use_auth_token parameter at FARMReader

* Add latest docstring and tutorial changes

* added docs for parameter `use_auth_token`

* Add latest docstring and tutorial changes

* adding docs link

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-19 16:48:31 +01:00
C V Goudar
a9a379784a
Facilitate concurrent query / indexing in Elasticsearch with dense retrievers (new skip_missing_embeddings param) (#1762)
* Filtering records not having embeddings

* Added support for skip_missing_embeddings Flag. Default behavior is throw error when embeddings are missing. If skip_missing_embeddings=True then documents without embeddings are ignored for vector similarity

* Fix for below error:
haystack/document_stores/elasticsearch.py:852: error: Need type annotation for "script_score_query"

* docstring for skip_missing_embeddings parameter

* Raise exception where no documents with embeddings is found for Embedding retriever.

* Default skip_missing_embeddings to True

* Explicitly check if embeddings are present if no results are returned by EmbeddingRetriever for Elasticsearch

* Added test case for based on Julian's input

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Simplify code by using get_embed_count

* Adjust docstring & error msg slightly

* Revert error msg

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-19 14:50:23 +01:00
Sara Zan
d81897535e
Public demo (#1747)
* Queries now run only when pressing RUN. File upload hidden. Question is not sent if the textbox is empty.

* Add latest docstring and tutorial changes

* Tidy up: remove needless state, add comments, fix minor bugs

* Had to add results to the status to avoid some bugs in eval mode

* Added 'credits'

* Add footers, update requirements, some random questions for the evaluation

* Add requested changes

* Temporary rollback the UI to the old GoT dataset

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-19 11:34:32 +01:00
Malte Pietsch
c0892717a0
Fix usage of filters in /query endpoint in REST API (#1774)
* WIP filter refactoring

* fix filter formatting

* remove inplace modification of filters
2021-11-18 18:13:03 +01:00
bogdankostic
31e22012da
Allow TableReader models without aggregation classifier (#1772) 2021-11-18 09:59:07 +01:00
bogdankostic
5e36988b31
Support Tables in all DocumentStores (#1744)
* Add support for tables in SQLDocumentStore, FAISSDocumentStore and MilvuDocumentStore

* Add support for WeaviateDocumentStore

* Make sure that embedded meta fields are strings + add embedding_dim to WeaviateDocStore in test config

* Add latest docstring and tutorial changes

* Represent tables in WeaviateDocumentStore as nested lists

* Fix mypy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-17 16:41:04 +01:00
Sara Zan
faf41df65c
Pipelines now tolerate custom _debug content (#1756)
* Pipelines now tolerate custom _debug content
2021-11-17 15:50:56 +01:00
tstadel
0021668394
exclude test_summarizer_translation.py for windows_ci (#1759) 2021-11-16 10:13:16 +01:00
Julian Risch
f3e46b8cc7
fix import of EvaluationResult in test case 2021-11-16 09:55:09 +01:00
tstadel
956d5bba43
Split summarizer tests in order to make windows CI work again (#1757)
* separate testfile for summarizer with translation

* Add latest docstring and tutorial changes

* import SPLIT_DOCS from test_summarizer

* add workflow_dispatch to windows_ci

* add worflow_dispatch to linux_ci

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-15 18:49:49 +01:00
tstadel
59e04cba05
Multi query eval (#1746)
* add eval() to pipeline

* Add latest docstring and tutorial changes

* support multiple queries in eval()

* Add latest docstring and tutorial changes

* keep single query test

* fix EvaluationResult node_results default

* adjust docstrings

* Add latest docstring and tutorial changes

* minor improvements from comments

* Add latest docstring and tutorial changes

* move EvaluationResult and calculate_metrics to schema

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-15 14:51:11 +01:00
nishanthcgit
cf603042b2
Capitalize starting letter in params (#1750)
* Capitalize starting letter in params

Capitalized the starting letter in code examples for params in keeping with the latest names for nodes where first letter is capitalized. 
Refer: https://github.com/deepset-ai/haystack/issues/1748

* Update standard_pipelines.py

Capitalized some starting letters in the docstrings in keeping with the updated node names for standard pipelines
2021-11-15 12:38:13 +01:00
Sara Zan
1a10de506c
Split pipeline tests into three suites (#1755)
* Split pipeline tests into three suites

* Will this trigger the CI?

* Rename duplicate test into test_most_similar_documents_pipeline

* Fixing a bug that was probably never noticed
2021-11-15 12:16:27 +01:00
Sara Zan
09a462d756
Fix print_answers (#1743)
* Fix a specific path of print_answers that was assuming answers are dictionaries

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-15 09:50:09 +01:00
Sara Zan
ea3abd305b
Fix a few details of some tutorials (#1733)
* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15

* Add a type check in 'print_answers'

* Add same checks to print_documents and print_questions

* Make RAGenerator return Answers instead of dictionaries

* Fix RAGenerator tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-12 16:44:28 +01:00
Sara Zan
85a08d671a
Fix another self.device/s typo (#1734)
* Fix yet another self.device(s) typo

* Add typing to 'initialize_device_settings' to try prevent future issues

* Fix bug in Tutorial5

* Fix the same bug in the notebook

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-11 17:18:06 +01:00
Branden Chan
8082549663
Add debugging example to tutorial (#1731)
* Add debugging example to tutorial

* Add latest docstring and tutorial changes

* Remove Objects suffix

* Add latest docstring and tutorial changes

* Revert "Remove Objects suffix"

This reverts commit 6681cb06510b080775994effe6a50bae42254be4.

* Revert unintentional commit

* Add third debugging option

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-11 14:45:06 +01:00
Julian Risch
7059344d9e
Change answer aggregation key to doc_id, query instead of label_id, query (#1726) 2021-11-11 13:02:46 +01:00
Branden Chan
81f82b1b95
Update API Reference Pages for v1.0 (#1729)
* Create new API pages and update existing ones

* Create query classifier page

* Remove Objects suffix
2021-11-11 12:44:29 +01:00
tstadel
158460504b
Make FAISSDocumentStore work with yaml (#1727)
* add faiss_index_path and faiss_config_path

* Add latest docstring and tutorial changes

* remove duplicate cleaning stuff

* refactoring + test for invalid param combination

* adjust type hints

* Add latest docstring and tutorial changes

* add documentation to @preload_index

* Add latest docstring and tutorial changes

* recursive __init__ instead of decorator

* Add latest docstring and tutorial changes

* validate instead of check

* combine ifs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-11 11:02:22 +01:00
Sara Zan
42c8edca54
Simplify logs management (#1696)
* Move each haystack module's logger configuration into the respective file and configure the handlers properly

* Implement most changes from #1714

* Remove accidentally committed git merge tags ':D

* Remove the debug logs capture feature

* Remove more references to debug_logs

* Fix issue with FARMReader that somehow made it to master

* Add devices parameter to Inferencer

* Change log of APEX message to DEBUG and lower the 'Starting <docstore>...' messages to DEBUG as well

* Change log level of a few logs from modeling

* Silence the transformers warning

* Remove empty line below the workers :)

* Fix two more levels in the tutorials logs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2021-11-11 10:16:25 +01:00
Malte Pietsch
b28dd823ef
Improve open api spec (#1700)
* improve open api spec

* move to automatic generation of better operationIDs
2021-11-11 09:40:58 +01:00
tstadel
14515a861b
Tutorial for DocumentClassifier at Index Time (#1697)
* basic example of document classifier in preprocessing logic

* add batch_size to TransformersDocumentClassifier

* complete tutorial16

* Add latest docstring and tutorial changes

* fix missing batch_size

* add notebook

* test for batch_size use added

* add tutorial 16 to headers.py

* Add latest docstring and tutorial changes

* make DocumentClassifier indexing pipeline rdy

* Add latest docstring and tutorial changes

* flexibility improvements for DocumentClassifier in Pipelines

* Add latest docstring and tutorial changes

* fix index time usage

* remove query from documentclassifier tests

* improve classification_field resolving + minor fixes

* Add latest docstring and tutorial changes

* tutorial 16 extended with zero shot and pipelines

* Add latest docstring and tutorial changes

* install graphviz in notebook

* Add latest docstring and tutorial changes

* remove convert_to_dicts

* Add latest docstring and tutorial changes

* Fix typo

* Add latest docstring and tutorial changes

* remove retriever from indexing pipeline

* Add latest docstring and tutorial changes

* fix save_to_yaml when using FileTypeClassifier

* emphasize the impact with zero shot classification

* Add latest docstring and tutorial changes

* adjust use_gpu to boolean in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-09 18:43:00 +01:00
Sara Zan
91cafb49bb
Improve tutorials' output (#1694)
* Modify __str__ and __repr__ for Document and Answer

* Rename QueryClassifier in Tutorial11

* Improve the output of tutorial1

* Make the output of Tutorial8 a bit less dense

* Add a print_questions util to print the output of question generating pipelines

* Replace custom printing with the new utility in Tutorial13

* Ensure all output is printed with minimal details in Tutorial14 and add some titles

* Minor change to print_answers

* Make tutorial3's output the same as tutorial1

* Add __repr__ to Answer and fix to_dict()

* Fix a bug in the Document and Answer's __str__ method

* Improve print_answers, print_documents and print_questions

* Using print_answers in Tutorial7 and fixing typo in the utils

* Remove duplicate line in Tutorial12

* Use print_answers in Tutorial4

* Add explanation of what the documents in the output of the basic QA pipeline are

* Move the fields constant into print_answers

* Normalize all 'minimal' to 'minimum' (they were mixed up)

* Improve the sample output to include all fields from Document and Answer

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-09 15:09:26 +01:00
Alon Eirew
861522b6b1
fix #1687 (#1688) 2021-11-09 12:52:07 +01:00
bogdankostic
cd8666f904
Standardize initialisation of device settings (#1683)
* Use initialize_device_settings in all nodes

* Set StreamHandler level to INFO

* Add latest docstring and tutorial changes

* work in progress

* Standardize device initialization

* Add latest docstring and tutorial changes

* Adapt device initialization in Reader's train method

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-09 12:44:20 +01:00
Timo Moeller
5ac89332a2
Fix Typo in TableQA Tutorial (#1690) 2021-11-08 17:14:04 +01:00
Julian Risch
c9087da2ac
rename text variable of document to content (#1704) 2021-11-08 17:07:36 +01:00
bogdankostic
5654ad1243
Fix error when model does not select any cells (#1703) 2021-11-08 15:31:57 +01:00
Julian Risch
892ce4a760
Make weaviate more compliant to other doc stores (UUIDs and dummy embedddings) (#1656)
* create uuid and dummy embeddding in weaviate doc store

* handle and test for duplicate non-uuid-formatted ids in weaviate

* add uuid and dummy embedding to doc strings

* Add latest docstring and tutorial changes

* Upgrade weaviate

* Include weaviate in common doc store test cases

* Add latest docstring and tutorial changes

* Exclude weaviate doc store from eval tests

* Incorporate index name in uuid generation

* Ignore mypy error

* Fix typo

* Restore DOCS without uuid and embeddings generated by weaviate

* Supply docs for retriever tests as fixture

* Limit scope of fixture to function instead of session

* Add comments

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-04 09:27:12 +01:00
Branden Chan
4ca1937775
Standardize similarity argument description (#1684)
* Standardize argument similarity argument description

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-02 14:53:26 +01:00
fingoldo
27793814cf
Cosine similarity for the rest of DocStores. (#1569)
* Added uniform normalization method to each of the DocStores (implemented), so that now Milvus and Weaviate doc stores can use cosine similarity, plus future method for making existing embeddings normaziled (empty for now).

* Fixed a typo.

* Fixed lots of stuff. Performed local tests.

* Fixed scores representation for cosine. Assuming Weavieate's rep needs no change.

* fixes as per discussion

* Trigger CI

* resolving conflicts

* small typo

* fixed a param type

* cleaned some conflicts resolving left overs

* commented out fastmath for a moment

* fixing tests

* added docstore for small vectors

* test

* fixed document_store_cosine_small

* cosine tests fixes

* fixed document_store_cosine_small

* fixed weaviate index name and lowered rtol for ES

* increased rtol

* added explicit doc_ids for weaviate, excluded ES, included Inmemory

* resolving mismatch

* fixing a typo

* flatten normalize_embedding()

* fix import for test

* standardize normalize_embeddings across doc stores

* Add latest docstring and tutorial changes

* going for the faster plain dot prod

Co-authored-by: fingoldo <fingoldo@gmail.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-01 13:42:32 +01:00
Julian Risch
c8df4763f8
disable file upload for InMemoryDocStore (#1677) 2021-11-01 10:39:13 +01:00
Julian Risch
efdcd24d70
fixed typo (#1680)
* fixed typo

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-01 10:38:39 +01:00
Branden Chan
da90acf650
Update README.md (#1682) 2021-10-29 18:19:21 +02:00
Branden Chan
b9ea9a8ae0
Add collapsing sections to readme (#1663)
* Add collapsing sections to readme

* Add emojis

* Test new collapse style

* Test formatting

* Test formatting

* Test formatting

* Test formatting
2021-10-29 16:39:58 +02:00
bogdankostic
9025615be7
Add TableQA tutorial (#1670)
* Add TableQA tutorial

* Add tutorial header

* Add latest docstring and tutorial changes

* Add more details

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-29 11:07:13 +02:00
bogdankostic
7d5078fbb6
Add TableTextRetriever to node's __init__.py (#1678) 2021-10-29 11:02:47 +02:00
Lalit Pagaria
e5b4b62d75
Add CI for windows runner (#1458)
* Feat: Removing use of temp file while downloading archive from url along with adding CI for windows and mac platform

* Windows CI by default installing pytorch gpu hence updating CI to pick cpu version

* fixing mac cache build issue

* updating windows pip install command for torch

* another attempt

* updating ci

* Adding sudo

* fixing ls failure on windows

* another attempt to fix build issue

* Saving env variable of test files

* Adding debug log

* Github action differ on windows

* adding debug

* anohter attempt

* Windows have different ways to receive env

* fixing template

* minor fx

* Adding debug

* Removing use of json

* Adding back fromJson

* addin toJson

* removing print

* anohter attempt

* disabling parallel run at least for testing

* installing docker for mac runner

* correcting docker install command

* Linux dockers are not suported in windows

* Removing mac changes

* Upgrading pytorch

* using lts pytorch

* Separating win and ubuntu

* Install java 11

* enabling linux container env

* docker cli command

* docker cli command

* start elastic service

* List all service

* correcting service name

* Attempt to fix multiple test run

* convert to json

* another attempt to check

* Updating build cache step

* attempt

* Add tika

* Separating windows CI

* Changing CI name

* Skipping test which does not work in windows

* Skipping tests for windows

* create cleanup function in conftest

* adding skipif marker on tests

* Run windows PR on only push to master

* Addressing review comments

* Enabling windows ci for this PR

* Tika init is being called when importing tika function

* handling tika import issue

* handling tika import issue in test

* Fixing import issue

* removing tika fixure

* Removing fixture from tests

* Disable windows ci on pull request

* Add back extra pytorch install step

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-29 10:22:28 +02:00
Sara Zan
08341f5698
Raise a warning if the 'query' param of the 'query' method of 'ElasticsearchDocumentStore' is not a string. (#1674) 2021-10-29 10:10:03 +02:00
ju-gu
ec816339bf
Fix docstring of crawler (#1673) 2021-10-28 18:33:20 +02:00
Julian Risch
33b2663fdc
ensure tf-idf matrix calculation before retrieval (#1665)
* ensure tf-idf matrix calculation before retrieval

* Run fit() automatically if new documents have been added

* Add latest docstring and tutorial changes

* Fix type error

* Add test case for tfidf retriever yaml pipeline

* Use InMemoryDocStore and add 2nd test case

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-28 16:48:06 +02:00
Sara Zan
eab475bb5d
Rename every occurrence of 'embed_passages' with 'embed_documents' (#1667)
* Rename every occurrence of 'embed_passages' with 'embed_documents'

* Remove aliased method embed_documents

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-28 12:17:56 +02:00
Timo Moeller
6892955e95
Add execute permissions (#1666) 2021-10-27 17:35:34 +02:00
Sara Zan
fd184d607f
Add a restart policy on-failure to all containers 2021-10-27 17:07:36 +02:00
Branden Chan
171fd7be38
Update README.md (#1653)
* Update README.md

* Incorporate link into Haystack logo

* Fix jobs link

* Update tutorials and demo

* Change order of sections

* Rename tutorial section

* Create jobs and community sections

* Change wording

* Change section title

* Change wording

* Add tutorial links and pipeline image
2021-10-27 15:55:34 +02:00