* Make strealit tolerant to haystack not knowing its version, and adding special error for docstore issues
* Add workaround for a Streamlit bug
* Make default filters value an empty dict
* Return more context for each answer in the rest api
* Make the hs_version call not-blocking by adding a very quick timeout
* Add disclaimer on low confidence answer
* Use the no-answer feature of the reader to highlight questions with no good answer
* initialize doc store with doc and label index
* change ipynb according to py for tutorial 5
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Filtering records not having embeddings
* Added support for skip_missing_embeddings Flag. Default behavior is throw error when embeddings are missing. If skip_missing_embeddings=True then documents without embeddings are ignored for vector similarity
* Fix for below error:
haystack/document_stores/elasticsearch.py:852: error: Need type annotation for "script_score_query"
* docstring for skip_missing_embeddings parameter
* Raise exception where no documents with embeddings is found for Embedding retriever.
* Default skip_missing_embeddings to True
* Explicitly check if embeddings are present if no results are returned by EmbeddingRetriever for Elasticsearch
* Added test case for based on Julian's input
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Simplify code by using get_embed_count
* Adjust docstring & error msg slightly
* Revert error msg
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Queries now run only when pressing RUN. File upload hidden. Question is not sent if the textbox is empty.
* Add latest docstring and tutorial changes
* Tidy up: remove needless state, add comments, fix minor bugs
* Had to add results to the status to avoid some bugs in eval mode
* Added 'credits'
* Add footers, update requirements, some random questions for the evaluation
* Add requested changes
* Temporary rollback the UI to the old GoT dataset
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add support for tables in SQLDocumentStore, FAISSDocumentStore and MilvuDocumentStore
* Add support for WeaviateDocumentStore
* Make sure that embedded meta fields are strings + add embedding_dim to WeaviateDocStore in test config
* Add latest docstring and tutorial changes
* Represent tables in WeaviateDocumentStore as nested lists
* Fix mypy
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* separate testfile for summarizer with translation
* Add latest docstring and tutorial changes
* import SPLIT_DOCS from test_summarizer
* add workflow_dispatch to windows_ci
* add worflow_dispatch to linux_ci
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Capitalize starting letter in params
Capitalized the starting letter in code examples for params in keeping with the latest names for nodes where first letter is capitalized.
Refer: https://github.com/deepset-ai/haystack/issues/1748
* Update standard_pipelines.py
Capitalized some starting letters in the docstrings in keeping with the updated node names for standard pipelines
* Split pipeline tests into three suites
* Will this trigger the CI?
* Rename duplicate test into test_most_similar_documents_pipeline
* Fixing a bug that was probably never noticed
* Fix a specific path of print_answers that was assuming answers are dictionaries
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15
* Add a type check in 'print_answers'
* Add same checks to print_documents and print_questions
* Make RAGenerator return Answers instead of dictionaries
* Fix RAGenerator tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix yet another self.device(s) typo
* Add typing to 'initialize_device_settings' to try prevent future issues
* Fix bug in Tutorial5
* Fix the same bug in the notebook
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Move each haystack module's logger configuration into the respective file and configure the handlers properly
* Implement most changes from #1714
* Remove accidentally committed git merge tags ':D
* Remove the debug logs capture feature
* Remove more references to debug_logs
* Fix issue with FARMReader that somehow made it to master
* Add devices parameter to Inferencer
* Change log of APEX message to DEBUG and lower the 'Starting <docstore>...' messages to DEBUG as well
* Change log level of a few logs from modeling
* Silence the transformers warning
* Remove empty line below the workers :)
* Fix two more levels in the tutorials logs
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Modify __str__ and __repr__ for Document and Answer
* Rename QueryClassifier in Tutorial11
* Improve the output of tutorial1
* Make the output of Tutorial8 a bit less dense
* Add a print_questions util to print the output of question generating pipelines
* Replace custom printing with the new utility in Tutorial13
* Ensure all output is printed with minimal details in Tutorial14 and add some titles
* Minor change to print_answers
* Make tutorial3's output the same as tutorial1
* Add __repr__ to Answer and fix to_dict()
* Fix a bug in the Document and Answer's __str__ method
* Improve print_answers, print_documents and print_questions
* Using print_answers in Tutorial7 and fixing typo in the utils
* Remove duplicate line in Tutorial12
* Use print_answers in Tutorial4
* Add explanation of what the documents in the output of the basic QA pipeline are
* Move the fields constant into print_answers
* Normalize all 'minimal' to 'minimum' (they were mixed up)
* Improve the sample output to include all fields from Document and Answer
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Use initialize_device_settings in all nodes
* Set StreamHandler level to INFO
* Add latest docstring and tutorial changes
* work in progress
* Standardize device initialization
* Add latest docstring and tutorial changes
* Adapt device initialization in Reader's train method
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* create uuid and dummy embeddding in weaviate doc store
* handle and test for duplicate non-uuid-formatted ids in weaviate
* add uuid and dummy embedding to doc strings
* Add latest docstring and tutorial changes
* Upgrade weaviate
* Include weaviate in common doc store test cases
* Add latest docstring and tutorial changes
* Exclude weaviate doc store from eval tests
* Incorporate index name in uuid generation
* Ignore mypy error
* Fix typo
* Restore DOCS without uuid and embeddings generated by weaviate
* Supply docs for retriever tests as fixture
* Limit scope of fixture to function instead of session
* Add comments
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Added uniform normalization method to each of the DocStores (implemented), so that now Milvus and Weaviate doc stores can use cosine similarity, plus future method for making existing embeddings normaziled (empty for now).
* Fixed a typo.
* Fixed lots of stuff. Performed local tests.
* Fixed scores representation for cosine. Assuming Weavieate's rep needs no change.
* fixes as per discussion
* Trigger CI
* resolving conflicts
* small typo
* fixed a param type
* cleaned some conflicts resolving left overs
* commented out fastmath for a moment
* fixing tests
* added docstore for small vectors
* test
* fixed document_store_cosine_small
* cosine tests fixes
* fixed document_store_cosine_small
* fixed weaviate index name and lowered rtol for ES
* increased rtol
* added explicit doc_ids for weaviate, excluded ES, included Inmemory
* resolving mismatch
* fixing a typo
* flatten normalize_embedding()
* fix import for test
* standardize normalize_embeddings across doc stores
* Add latest docstring and tutorial changes
* going for the faster plain dot prod
Co-authored-by: fingoldo <fingoldo@gmail.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Feat: Removing use of temp file while downloading archive from url along with adding CI for windows and mac platform
* Windows CI by default installing pytorch gpu hence updating CI to pick cpu version
* fixing mac cache build issue
* updating windows pip install command for torch
* another attempt
* updating ci
* Adding sudo
* fixing ls failure on windows
* another attempt to fix build issue
* Saving env variable of test files
* Adding debug log
* Github action differ on windows
* adding debug
* anohter attempt
* Windows have different ways to receive env
* fixing template
* minor fx
* Adding debug
* Removing use of json
* Adding back fromJson
* addin toJson
* removing print
* anohter attempt
* disabling parallel run at least for testing
* installing docker for mac runner
* correcting docker install command
* Linux dockers are not suported in windows
* Removing mac changes
* Upgrading pytorch
* using lts pytorch
* Separating win and ubuntu
* Install java 11
* enabling linux container env
* docker cli command
* docker cli command
* start elastic service
* List all service
* correcting service name
* Attempt to fix multiple test run
* convert to json
* another attempt to check
* Updating build cache step
* attempt
* Add tika
* Separating windows CI
* Changing CI name
* Skipping test which does not work in windows
* Skipping tests for windows
* create cleanup function in conftest
* adding skipif marker on tests
* Run windows PR on only push to master
* Addressing review comments
* Enabling windows ci for this PR
* Tika init is being called when importing tika function
* handling tika import issue
* handling tika import issue in test
* Fixing import issue
* removing tika fixure
* Removing fixture from tests
* Disable windows ci on pull request
* Add back extra pytorch install step
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* ensure tf-idf matrix calculation before retrieval
* Run fit() automatically if new documents have been added
* Add latest docstring and tutorial changes
* Fix type error
* Add test case for tfidf retriever yaml pipeline
* Use InMemoryDocStore and add 2nd test case
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update README.md
* Incorporate link into Haystack logo
* Fix jobs link
* Update tutorials and demo
* Change order of sections
* Rename tutorial section
* Create jobs and community sections
* Change wording
* Change section title
* Change wording
* Add tutorial links and pipeline image