* Add knowledge graph module
* Fix type hint
* Add graph retriver module
* Change type annotations, change return format
* Add graph retriever that executes questions as sparql queries
* Linking only those entities that are in the knowledge graph
* Added logging and using relations extracted from Knowledge graph for linking
* Preventing entity linking from linking the same token to multiple entities
* Pruning triples that have no variables for select and count queries
* Support knowledge graphs with Pipelines
* Add text2sparql
* Entity linking and relation linking consider more special cases now based on evaluation on labelled data
* Separating example code from KGQA implementation
* Add eval on combined extarctive and kg questions
* Remove references to hp-test
* Add fields sparql_query and long_answer_list to metadata
* Removing modular Question2SPARQL approach
* Removing additional classes used for modular kgqa approach
* preparing lcquad data
* change graph db
* Translating namespaces in knowledge graph queries
* Creating graphdb index and loading triples from .ttl file
* Fetching graph config files, triples and model from S3
* Fix incompatibility issues with BaseGraphRetriever and BaseComponent
* Removing unused utility functions
* Adding doc strings and tutorial header
* Adding sparqlwrapper dependency
* Moving tutorial header
* Sorting tutorials by number within name of notebook
* Add latest docstring and tutorial changes
* Creating test cases for knowledge graph
* Changing knowledge graph example to harry potter
* Add latest docstring and tutorial changes
* Adapting the tutorial notebook to harry potter example
* Add GraphDB fixture for tests
* Add latest docstring and tutorial changes
* Added GraphDB docker launch to CI
* Use correct GraphDB fixture
* Check if GraphDB instance is already running
* Renaming question/query and incorporating other feedback from Timo and Tanay
* Removed type annotation
* Add latest docstring and tutorial changes
Co-authored-by: oryx1729 <oryx1729@protonmail.com>
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Adding translator with many generic input parameter support
* Making dict_key as generic
* Fixing mypy issue
* Adding pipeline and using opus models
* Add latest docstring and tutorial changes
* Adding test cases for end-to-end translation for generator, summerizer etc
* raise error join and merge nodes
* Fix test failure
* add docstrings. add usage documentation. rm skip_special_tokens param
* Add latest docstring and tutorial changes
* fix code snippets in md
* Adding few extra configuration parameters and fixing tests
* Fixingmypy issue and updating usage document
* fix for mypy issue in pipeline.py
* reverting renaming of pytest_collection_modifyitems method
* Addressing review comments
* setting skip_special_tokens to True
* removing model_max_length argument as None type is not supported to many models
* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* fix encoding of pdftotext. fix version in download instructions
* fix test
* Add latest docstring and tutorial changes
* make latin-1 default encoding again
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make batchwise adding of evaluation data possible
* Fix typos in docstrings
* Merge add_eval_data and add_eval_data_batchwise
* Improve import statements
* Move add_eval_data to BaseDocumentStore
* Add batch_size param to write_documents and write_labels in EsDocStore
* Adjust docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Integration of SummarizationQAPipeline with Haystack.
* Moving summarizer tests because of OOM issue
* Fixing typo
* Splitting summarizer test in separate ci step
* Removing sysctl configuration as we already running elastic search in docker container
* fixing mypy issue
* update parameter names and docstrings
* update parameter names in BaseSummarizer
* rename pipeline
* change return type of summarizer from answer to document
* change scope of doc store fixture
* revert scope
* temp. disable test_faiss_index_save_and_load()
* fix mypy. change order for mypy in CI
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Using Columns name instead of ORM object for get all documents call
* Separating meta search from documents. This way it will optimize the memory not duplicating document.text
* Fixing mypy issue
* SQLite have limit on number of host variable hence using batching to fetch meta information
* Query meta only if meta field is not Null in DocOrm
* Add batch_size to other functions except label
* meta can be none so fix that issue
* Dummy commit to trigger CI
* Using chunked dictionary
* Upgrading faiss
* reverting change related to faiss upgrade
* Changing DB name in test_faiss_retrieving test as it might interfere with exiting files by corrupting DB file
* Updating doc string related to batch_size
* Update docstring for batch_size
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* WIP: First version of preprocessing tutorial
* stride renamed overlap, ipynb and py files created
* rename split_stride in test
* Update preprocessor api documentation
* define order for markdown files
* define order of modules in api docs
* Add colab links
* Incorporate review feedback
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
* Update preprocessor.py
Concatenation of sentences done correctly. Stride functionality enabled for splitting by words while respecting sentence boundaries.
* Simplify code, add test
Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>