* Add support for tables in SQLDocumentStore, FAISSDocumentStore and MilvuDocumentStore
* Add support for WeaviateDocumentStore
* Make sure that embedded meta fields are strings + add embedding_dim to WeaviateDocStore in test config
* Add latest docstring and tutorial changes
* Represent tables in WeaviateDocumentStore as nested lists
* Fix mypy
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* separate testfile for summarizer with translation
* Add latest docstring and tutorial changes
* import SPLIT_DOCS from test_summarizer
* add workflow_dispatch to windows_ci
* add worflow_dispatch to linux_ci
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Capitalize starting letter in params
Capitalized the starting letter in code examples for params in keeping with the latest names for nodes where first letter is capitalized.
Refer: https://github.com/deepset-ai/haystack/issues/1748
* Update standard_pipelines.py
Capitalized some starting letters in the docstrings in keeping with the updated node names for standard pipelines
* Split pipeline tests into three suites
* Will this trigger the CI?
* Rename duplicate test into test_most_similar_documents_pipeline
* Fixing a bug that was probably never noticed
* Fix a specific path of print_answers that was assuming answers are dictionaries
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15
* Add a type check in 'print_answers'
* Add same checks to print_documents and print_questions
* Make RAGenerator return Answers instead of dictionaries
* Fix RAGenerator tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix yet another self.device(s) typo
* Add typing to 'initialize_device_settings' to try prevent future issues
* Fix bug in Tutorial5
* Fix the same bug in the notebook
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Move each haystack module's logger configuration into the respective file and configure the handlers properly
* Implement most changes from #1714
* Remove accidentally committed git merge tags ':D
* Remove the debug logs capture feature
* Remove more references to debug_logs
* Fix issue with FARMReader that somehow made it to master
* Add devices parameter to Inferencer
* Change log of APEX message to DEBUG and lower the 'Starting <docstore>...' messages to DEBUG as well
* Change log level of a few logs from modeling
* Silence the transformers warning
* Remove empty line below the workers :)
* Fix two more levels in the tutorials logs
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Modify __str__ and __repr__ for Document and Answer
* Rename QueryClassifier in Tutorial11
* Improve the output of tutorial1
* Make the output of Tutorial8 a bit less dense
* Add a print_questions util to print the output of question generating pipelines
* Replace custom printing with the new utility in Tutorial13
* Ensure all output is printed with minimal details in Tutorial14 and add some titles
* Minor change to print_answers
* Make tutorial3's output the same as tutorial1
* Add __repr__ to Answer and fix to_dict()
* Fix a bug in the Document and Answer's __str__ method
* Improve print_answers, print_documents and print_questions
* Using print_answers in Tutorial7 and fixing typo in the utils
* Remove duplicate line in Tutorial12
* Use print_answers in Tutorial4
* Add explanation of what the documents in the output of the basic QA pipeline are
* Move the fields constant into print_answers
* Normalize all 'minimal' to 'minimum' (they were mixed up)
* Improve the sample output to include all fields from Document and Answer
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Use initialize_device_settings in all nodes
* Set StreamHandler level to INFO
* Add latest docstring and tutorial changes
* work in progress
* Standardize device initialization
* Add latest docstring and tutorial changes
* Adapt device initialization in Reader's train method
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* create uuid and dummy embeddding in weaviate doc store
* handle and test for duplicate non-uuid-formatted ids in weaviate
* add uuid and dummy embedding to doc strings
* Add latest docstring and tutorial changes
* Upgrade weaviate
* Include weaviate in common doc store test cases
* Add latest docstring and tutorial changes
* Exclude weaviate doc store from eval tests
* Incorporate index name in uuid generation
* Ignore mypy error
* Fix typo
* Restore DOCS without uuid and embeddings generated by weaviate
* Supply docs for retriever tests as fixture
* Limit scope of fixture to function instead of session
* Add comments
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Added uniform normalization method to each of the DocStores (implemented), so that now Milvus and Weaviate doc stores can use cosine similarity, plus future method for making existing embeddings normaziled (empty for now).
* Fixed a typo.
* Fixed lots of stuff. Performed local tests.
* Fixed scores representation for cosine. Assuming Weavieate's rep needs no change.
* fixes as per discussion
* Trigger CI
* resolving conflicts
* small typo
* fixed a param type
* cleaned some conflicts resolving left overs
* commented out fastmath for a moment
* fixing tests
* added docstore for small vectors
* test
* fixed document_store_cosine_small
* cosine tests fixes
* fixed document_store_cosine_small
* fixed weaviate index name and lowered rtol for ES
* increased rtol
* added explicit doc_ids for weaviate, excluded ES, included Inmemory
* resolving mismatch
* fixing a typo
* flatten normalize_embedding()
* fix import for test
* standardize normalize_embeddings across doc stores
* Add latest docstring and tutorial changes
* going for the faster plain dot prod
Co-authored-by: fingoldo <fingoldo@gmail.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Feat: Removing use of temp file while downloading archive from url along with adding CI for windows and mac platform
* Windows CI by default installing pytorch gpu hence updating CI to pick cpu version
* fixing mac cache build issue
* updating windows pip install command for torch
* another attempt
* updating ci
* Adding sudo
* fixing ls failure on windows
* another attempt to fix build issue
* Saving env variable of test files
* Adding debug log
* Github action differ on windows
* adding debug
* anohter attempt
* Windows have different ways to receive env
* fixing template
* minor fx
* Adding debug
* Removing use of json
* Adding back fromJson
* addin toJson
* removing print
* anohter attempt
* disabling parallel run at least for testing
* installing docker for mac runner
* correcting docker install command
* Linux dockers are not suported in windows
* Removing mac changes
* Upgrading pytorch
* using lts pytorch
* Separating win and ubuntu
* Install java 11
* enabling linux container env
* docker cli command
* docker cli command
* start elastic service
* List all service
* correcting service name
* Attempt to fix multiple test run
* convert to json
* another attempt to check
* Updating build cache step
* attempt
* Add tika
* Separating windows CI
* Changing CI name
* Skipping test which does not work in windows
* Skipping tests for windows
* create cleanup function in conftest
* adding skipif marker on tests
* Run windows PR on only push to master
* Addressing review comments
* Enabling windows ci for this PR
* Tika init is being called when importing tika function
* handling tika import issue
* handling tika import issue in test
* Fixing import issue
* removing tika fixure
* Removing fixture from tests
* Disable windows ci on pull request
* Add back extra pytorch install step
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* ensure tf-idf matrix calculation before retrieval
* Run fit() automatically if new documents have been added
* Add latest docstring and tutorial changes
* Fix type error
* Add test case for tfidf retriever yaml pipeline
* Use InMemoryDocStore and add 2nd test case
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update README.md
* Incorporate link into Haystack logo
* Fix jobs link
* Update tutorials and demo
* Change order of sections
* Rename tutorial section
* Create jobs and community sections
* Change wording
* Change section title
* Change wording
* Add tutorial links and pipeline image
* Truncate too large tables for TableReader
* Add documentation
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Files moved, imports all broken
* Fix most imports and docstrings into
* Fix the paths to the modules in the API docs
* Add latest docstring and tutorial changes
* Add a few pipelines that were lost in the inports
* Fix a bunch of mypy warnings
* Add latest docstring and tutorial changes
* Create a file_classifier module
* Add docs for file_classifier
* Fixed most circular imports, now the REST API can start
* Add latest docstring and tutorial changes
* Tackling more mypy issues
* Reintroduce from FARM and fix last mypy issues hopefully
* Re-enable old-style imports
* Fix some more import from the top-level package in an attempt to sort out circular imports
* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests
* Change document_store into document_stores
* Update imports in tutorials
* Add latest docstring and tutorial changes
* Probably fixes summarizer tests
* Improve the old-style import allowing module imports (should work)
* Try to fix the docs
* Remove dedicated KnowledgeGraph page from autodocs
* Remove dedicated GraphRetriever page from autodocs
* Fix generate_docstrings.sh with an updated list of yaml files to look for
* Fix some more modules in the docs
* Fix the document stores docs too
* Fix a small issue on Tutorial14
* Add latest docstring and tutorial changes
* Add deprecation warning to old-style imports
* Remove stray folder and import Dict into dense.py
* Change import path for MLFlowLogger
* Add old loggers path to the import path aliases
* Fix debug output of convert_ipynb.py
* Fix circular import on BaseRetriever
* Missed one merge block
* re-run tutorial 5
* Fix imports in tutorial 5
* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base
* Add latest docstring and tutorial changes
* Fix typo in utils __init__
* Fix a few more imports
* Fix benchmarks too
* New-style imports in test_knowledge_graph
* Rollback setup.py
* Rollback squad_to_dpr too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>