220 Commits

Author SHA1 Message Date
Julian Risch
892ce4a760
Make weaviate more compliant to other doc stores (UUIDs and dummy embedddings) (#1656)
* create uuid and dummy embeddding in weaviate doc store

* handle and test for duplicate non-uuid-formatted ids in weaviate

* add uuid and dummy embedding to doc strings

* Add latest docstring and tutorial changes

* Upgrade weaviate

* Include weaviate in common doc store test cases

* Add latest docstring and tutorial changes

* Exclude weaviate doc store from eval tests

* Incorporate index name in uuid generation

* Ignore mypy error

* Fix typo

* Restore DOCS without uuid and embeddings generated by weaviate

* Supply docs for retriever tests as fixture

* Limit scope of fixture to function instead of session

* Add comments

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-04 09:27:12 +01:00
Branden Chan
4ca1937775
Standardize similarity argument description (#1684)
* Standardize argument similarity argument description

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-02 14:53:26 +01:00
fingoldo
27793814cf
Cosine similarity for the rest of DocStores. (#1569)
* Added uniform normalization method to each of the DocStores (implemented), so that now Milvus and Weaviate doc stores can use cosine similarity, plus future method for making existing embeddings normaziled (empty for now).

* Fixed a typo.

* Fixed lots of stuff. Performed local tests.

* Fixed scores representation for cosine. Assuming Weavieate's rep needs no change.

* fixes as per discussion

* Trigger CI

* resolving conflicts

* small typo

* fixed a param type

* cleaned some conflicts resolving left overs

* commented out fastmath for a moment

* fixing tests

* added docstore for small vectors

* test

* fixed document_store_cosine_small

* cosine tests fixes

* fixed document_store_cosine_small

* fixed weaviate index name and lowered rtol for ES

* increased rtol

* added explicit doc_ids for weaviate, excluded ES, included Inmemory

* resolving mismatch

* fixing a typo

* flatten normalize_embedding()

* fix import for test

* standardize normalize_embeddings across doc stores

* Add latest docstring and tutorial changes

* going for the faster plain dot prod

Co-authored-by: fingoldo <fingoldo@gmail.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-01 13:42:32 +01:00
Julian Risch
efdcd24d70
fixed typo (#1680)
* fixed typo

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-01 10:38:39 +01:00
bogdankostic
9025615be7
Add TableQA tutorial (#1670)
* Add TableQA tutorial

* Add tutorial header

* Add latest docstring and tutorial changes

* Add more details

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-29 11:07:13 +02:00
Julian Risch
33b2663fdc
ensure tf-idf matrix calculation before retrieval (#1665)
* ensure tf-idf matrix calculation before retrieval

* Run fit() automatically if new documents have been added

* Add latest docstring and tutorial changes

* Fix type error

* Add test case for tfidf retriever yaml pipeline

* Use InMemoryDocStore and add 2nd test case

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-28 16:48:06 +02:00
Sara Zan
eab475bb5d
Rename every occurrence of 'embed_passages' with 'embed_documents' (#1667)
* Rename every occurrence of 'embed_passages' with 'embed_documents'

* Remove aliased method embed_documents

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-28 12:17:56 +02:00
bogdankostic
0c80ac9e62
Truncate too large tables for TableReader (#1662)
* Truncate too large tables for TableReader

* Add documentation

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-27 15:46:59 +02:00
Timo Moeller
1d3f63ac2e
Allow setting of scroll param in ElasticsearchDocumentStore (#1645)
* remove scroll param in ES call

* Add scroll param to ES init

* Add latest docstring and tutorial changes

* Add scroll to set_config

* remove trailing comma

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-27 11:07:13 +02:00
Branden Chan
9b2f40100d
Replace Haystack banner for readme (#1654)
* Replace haystack banner for readme

* Replace haystack banner for readme

* Update README.md

* Crop image

* Update README.md

revert to image from master branch
2021-10-26 17:59:45 +02:00
Sara Zan
13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
bogdankostic
51acf779f2
Add TableTextRetriever (#1529)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* renam label.question -> label.query. Allow sorting of Answers.

* WIP primitives

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* Add MultimodalRetriever

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add methods to MultimodalRetriever

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* keep meta data of answers in reader.run()

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* make document=None for open domain labels

* add import

* fix print utils

* fix rest api

* Add methods and tests

* Add latest docstring and tutorial changes

* Fix mypy

* Add latest docstring and tutorial changes

* Add type hints and doc strings

* Make use of initialize_device_settings

* Move serialization of pd.DataFrame to schema.py

* Fix mypy

* Adapt Document's from_dict method

* Update docstrings

* Add latest docstring and tutorial changes

* Fix mypy

* Fix mypy

* Fix Document's from_dict method

* Fix Document's to_dict method

* Change handling of table metadata

* Add latest docstring and tutorial changes

* Change naming from Multimodal to TableText

* Turn off tokenizers_parallelism in retriever tests

* Add latest docstring and tutorial changes

* Remove turning off tokenizers_parallelism in retriever tests

* Adapt convert_es_hit_to_document

* Change embed_surrounding_context to embed_meta_fields

* Add latest docstring and tutorial changes

* Add check if torch.distributed is available

* Set n_gpu to 0 in training test

* Set HIP_LAUNCH_BLOCKING to 1

* Set HIP_LAUNCH_BLOCKING to "1"

* Set use_gpu to False

* Use DataParallel only if more than one device

* Remove --find-links=https://download.pytorch.org/whl/torch_stable.html

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 12:27:02 +02:00
Julian Risch
6033319cfe
Fix parameter names in tutorial 5 and 12 (#1639)
* Fix parameter names in tutorial 5

* Update parameters in tutorial notebook

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-22 17:22:51 +02:00
Timo Moeller
9dc125df9d
Bugfix Tutorial 5 parameters, adjust default split length (#1635)
Bugfix parameters, adjust default split length, add sentencetransformers
2021-10-22 16:03:12 +02:00
Julian Risch
52e1fc991e
Update jobs link to personio (#1611)
* Update jobs link to personio

* Add latest docstring and tutorial changes

* Change jobs link to main website

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-21 11:42:32 +02:00
Julian Risch
f2a3f95ab6
add note on gpu runtime to tutorial 13 (#1614)
* add note on gpu runtime to tutorial 13

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-20 09:55:56 +02:00
Julian Risch
4ed2b90bca
Add delete_labels() except for weaviate doc store (#1604)
* Add delete_labels() except for weaviate doc store

* Add latest docstring and tutorial changes

* Add test for delete_labels()

* Adapt filter for label deletion to different doc stores in test

* Allow delete labels by _id in elasticsearch

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

* re-add bugfix after merge

* Add ids as optional parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-19 17:20:28 +02:00
Sara Zan
9722bbf1e1
DPR training: Rename TransformersAdamW to AdamW (#1613)
* Rename TransformersAdamW into simply AdamW (probably changed in transformers at some point)

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-19 16:18:30 +02:00
Sara Zan
575e64333c
Delete documents by ID in all document stores (#1606)
* Modify BaseDocumentStore.delete_documents() signature, implement ElasticSearch, and add tests

* Add implementation for InMemory

* Implement for SQL, FAISS and Milvus too

* Add tests for faiss and milvus

* Fix delete_all_documents

* Implement deletion by ID for weaviate

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Co-authored-by: sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>

Co-authored-by: prafgup <prafulgupta6@gmail.com>

Co-authored-by: ankh6 <andynzemokalumu@live.be>
2021-10-19 12:30:15 +02:00
Malte Pietsch
eb95f0e8aa
Add more flexible options for model downloads (Proxies, resume_download, local_files_only...) (#1256)
* allow passing more options for model/tokenizer download from remote

* temporarily change dependency to current farm master

* Add latest docstring and tutorial changes

* add kwargs

* add docstrings

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-18 15:47:36 +02:00
bogdankostic
655d721371
Add Table Reader (#1446)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* Add first draft of TableReader

* renam label.question -> label.query. Allow sorting of Answers.

* Add calculation of answer scores

* WIP primitives

* Adapt input and output to new primitives

* Add doc strings

* Add tests

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* Adapt answer format

* Add latest docstring and tutorial changes

* keep meta data of answers in reader.run()

* Fix mypy

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* Allow inference on GPU

* Remove automatic aggregation

* Add automatic aggregation

* Add latest docstring and tutorial changes

* Add torch-scatter dependency

* Add wheel to torch-scatter dependency

* Fix requirements

* Fix requirements

* Fix requirements

* Adapt setup.py to allow for wheels

* Fix requirements

* Fix requirements

* Add type hints and code snippet

* Add latest docstring and tutorial changes

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-15 16:34:48 +02:00
Julian Risch
5ec29a5283
Limit generator tests to memory doc store; split pipeline tests (#1602)
* Limit generator tests to memory doc store; split pipeline tests

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-15 15:37:46 +02:00
Malte Pietsch
99c8046367
Fix Tutorials (#1594)
* fix response format of DocumentSearchPipeline

* Add latest docstring and tutorial changes

* fix typos

* change prints in tutorial 4

* Add latest docstring and tutorial changes

* fix tutorial 13

* Add latest docstring and tutorial changes

* remove unused import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-14 11:49:35 +02:00
Malte Pietsch
caba590576
Fix answer format in ui (#1591)
* fix answer format in ui

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 16:48:33 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document, Answer, Label (#1398)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* renam label.question -> label.query. Allow sorting of Answers.

* WIP primitives

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* keep meta data of answers in reader.run()

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* make document=None for open domain labels

* add import

* fix print utils

* fix rest api

* adress review feedback

* Add latest docstring and tutorial changes

* fix mypy

Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Malte Pietsch
9650f7aed1
Add debug and debug_logs params to standard pipelines (#1586)
* add debug and debug_logs to standard pipelines

* Add latest docstring and tutorial changes

* fix params

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 16:00:48 +02:00
Sara Zan
6354528336
Add /documents/get_by_filters endpoint (#1580)
* Add endpoint to get documents by filter

* Add test for /documents/get_by_filter and extend the delete documents test

* Add rest_api/file-upload to .gitignore

* Make sure the document store is empty for each test

* Improve docstrings of delete_documents_by_filters and get_documents_by_filters

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 10:53:54 +02:00
Malte Pietsch
38652dd4dd
Enable GPU usage for QuestionGenerator (#1571)
* enable GPU usage for question generator

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-08 12:17:48 +02:00
Sara Zan
54947cb840
Return intermediate nodes output in pipelines (#1558)
* First rough implementation

* Add a flag to dump the debug logs to the console as well

* Typing run() and _dispatch_run()

* Allow debug and debug_logs to be passed as arguments of run()

* Avoid overwriting _debug, later we might want to store other objects in it

* Put logs under a separate key of the _debug dictionary and add input and output of the node alongside it

* Introduce global arguments for pipeline.run() that get applied to every node when defined

* Change default values of debug variables to None, otherwise their default would override the params values

* Remove a potential infinite recursion on the overridden __getattr__

* Do not append the output of the last node in the _debug key, it causes infinite recursion

* Add tests

* Move the input/output collection into _dispatch_run to gather only relevant info

* Add partial Pipeline.run() docstring

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-07 22:13:25 +02:00
Julian Risch
7e063b77d2
Format doc classifier usage example (#1550)
* Format doc classifier usage example

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-01 15:01:19 +02:00
Julian Risch
24483d7bad
TransformersDocumentClassifier replacing FARMClassifier (#1540)
* Initial draft of TransformersClassifier

* Add transformers classifier implementation

* Add test for SentenceTransformersClassifier

* Add truncation and corresponding test case to Classifier

* Add zero-shot classification and test

* Add document classifier documentation

* Add latest docstring and tutorial changes

* print meta data with print_documents()

* Add latest docstring and tutorial changes

* Remove top_k param from Classifier usage example

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-01 11:22:56 +02:00
Julian Risch
0e7338f0c6
Remove mentions of FARM from Ranker comments (#1535)
* Remove mentions of FARM from Ranker comments

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-29 11:57:30 +02:00
Sara Zan
a30a826c6c
Standardize delete_documents(filter=...) across all document stores (#1509)
* Make InMemoryDocumentStore accept and apply filters in delete_documents()

* Modify test_document_store.py to test the filtered deletion in memory, sql and milvus too

* Make FAISSDocumentStore accept and properly apply filters in delete_documents()

* Add latest docstring and tutorial changes

* Remove accidentally duplicated test

* Remove unnecessary decorators from test/test_document_store.py::test_delete_documents_with_filters

* Add embeddings count test for FAISS and Milvus; Milvus fails it.

* Fixed a bug that made Milvus not deleting embeddings

* Remove batch size parametrization in tests & update all documentstore's docstrings with a filter example

* Add latest docstring and tutorial changes

Co-authored-by: prafgup <prafulgupta6@gmail.com>
2021-09-29 09:27:06 +02:00
Julian Risch
f9d2f786ca
Replace FARM import statements; add dependencies (#1492)
* Replace FARM import statements; add dependencies

* Add InferenceProc., TextCl.Proc., TextPairCl.Proc.

* Remove FARMRanker, add type annotations, rename max_sample

* Add sample_to_features_text for InferenceProc.

* Fix type annotations: model_name_or_path is str not Path

* Fix mypy errors: implement _create_dataset in TextCl.Proc.

* Add task_type "embeddings" in Inferencer

* Allow loading AdaptiveModel for embedding task

* Add SQuAD eval metrics; enable InferenceProc for embedding task

* Add baskets as param to log_samples and handle empty basket list in log_samples

* Remove unused dependencies

* Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead

* Remove FARMRanker and Classifier from doc generation scripts

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-28 16:34:24 +02:00
Malte Pietsch
183fd5ae5a
Simplify tests & allow running on individual doc stores (#1487)
* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* update readme and contributing.md

* update contributing

* adjust example

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-27 10:52:07 +02:00
bogdankostic
c644e2b4d0
Add comment to tutorial notebooks about restarting runtime in colab (#1486)
* Add comment to tutorial notebooks about restarting runtime in colab

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-23 14:36:20 +02:00
Julian Risch
d569e66bc7
Update Tutorial1_Basic_QA_Pipeline.ipynb (#1489)
* Update Tutorial1_Basic_QA_Pipeline.ipynb

passing params to pipeline as dict

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-22 16:35:20 +02:00
Branden Chan
bddee2def4
Define SAS model in notebook (#1485)
* Define SAS model in notebook

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 17:05:16 +02:00
Branden Chan
2c4baa7f4e
Regenerate API and Tutorial md files (#1480)
* Change punctuation

* Add latest docstring and tutorial changes

* Change punctuation

* Add documentation for Docs2Answer

* Add latest docstring and tutorial changes

* Generate new API docs

* Replace Finder with Pipeline

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 14:42:18 +02:00
Markus Paff
39845c0624
Automate updates docstrings tutorials (#1461)
* remove not needed githab actions and reactivate docstrings and tutorial generation

* test workflow

* update pydoc version

* update python version

* update watchdog

* move to latest version pydoc-markdown

* remove version check

* Add latest docstring and tutorial changes

* remove test workflow

* test for param docstrings

* pin pydoc-markdown version

* add test workflow

* pin watchdog version

* Add latest docstring and tutorial changes

* update original workflow and delete test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-17 13:44:31 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00
Bob van Luijt
c0cc8bc80f
Bump Weaviate version to 1.7.0 (#1412)
* Bump Weaviate

* Bump Weaviate

* Bump Weaviate client

* Bump Weaviate

* Revert client version

There is a change in the client API that needs to be addressed before bumping its version
2021-09-05 09:28:55 +02:00
Ikram Ali
3fc7f3f695
[docs] crawler api docs updated. (#1388) 2021-09-01 12:07:32 +02:00
Branden Chan
1938fb001b
Add support for no Docker envs in Tutorial 13 (#1365)
* Add support for no docker envs e.g. colab

* Generate md
2021-08-31 15:22:51 +02:00
Shahrukh Khan
c3d8aa0643
Add query classifier usage docs (#1348)
* Create query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

* Update query_classifier.md

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-24 15:56:11 +02:00
Markus Paff
cac15310bd
adding tutorial 13 and 14 (#1364) 2021-08-23 11:37:06 +02:00
Markus Paff
ff2049cd45
updated tutorials (#1359) 2021-08-19 21:16:56 +02:00
Bob van Luijt
ba071cc052
Bump Weaviate version (#1336) 2021-08-12 09:54:09 +02:00
Markus Paff
7569ab97dd
Add faq annotation (#1333)
* add annotation faq to read.me

* design fix

* add faq to docs page

* changed format
2021-08-10 14:55:31 +02:00
Malte Pietsch
a0921f0c35
Remove Finder (#1326)
* deprecate finder

* remove import

* add doc section for moving from finder to pipelines
2021-08-09 13:41:40 +02:00