haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-27 23:18:37 +00:00

Author	SHA1	Message	Date
Sara Zan	fd184d607f	Add a restart policy `on-failure` to all containers	2021-10-27 17:07:36 +02:00
Branden Chan	171fd7be38	Update README.md (#1653 ) * Update README.md * Incorporate link into Haystack logo * Fix jobs link * Update tutorials and demo * Change order of sections * Rename tutorial section * Create jobs and community sections * Change wording * Change section title * Change wording * Add tutorial links and pipeline image	2021-10-27 15:55:34 +02:00
bogdankostic	0c80ac9e62	Truncate too large tables for TableReader (#1662 ) * Truncate too large tables for TableReader * Add documentation * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-27 15:46:59 +02:00
Timo Moeller	1d3f63ac2e	Allow setting of `scroll` param in ElasticsearchDocumentStore (#1645 ) * remove scroll param in ES call * Add scroll param to ES init * Add latest docstring and tutorial changes * Add scroll to set_config * remove trailing comma Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-10-27 11:07:13 +02:00
Julian Risch	e106eb41a7	Remove trailing comma in import statement (#1655 )	2021-10-27 10:11:22 +02:00
Branden Chan	9b2f40100d	Replace Haystack banner for readme (#1654 ) * Replace haystack banner for readme * Replace haystack banner for readme * Update README.md * Crop image * Update README.md revert to image from master branch	2021-10-26 17:59:45 +02:00
Andrey A	33892cf609	Link the logo in readme to the website (#1649 )	2021-10-26 15:04:58 +02:00
Sara Zan	6f6f2357fd	Fix import in Milvus2DocumentStore to import directly from the file and remove circular import with retriever (#1646 )	2021-10-26 11:47:25 +02:00
Sara Zan	13510aa753	Refactoring of the `haystack` package (#1624 ) * Files moved, imports all broken * Fix most imports and docstrings into * Fix the paths to the modules in the API docs * Add latest docstring and tutorial changes * Add a few pipelines that were lost in the inports * Fix a bunch of mypy warnings * Add latest docstring and tutorial changes * Create a file_classifier module * Add docs for file_classifier * Fixed most circular imports, now the REST API can start * Add latest docstring and tutorial changes * Tackling more mypy issues * Reintroduce from FARM and fix last mypy issues hopefully * Re-enable old-style imports * Fix some more import from the top-level package in an attempt to sort out circular imports * Fix some imports in tests to new-style to prevent failed class equalities from breaking tests * Change document_store into document_stores * Update imports in tutorials * Add latest docstring and tutorial changes * Probably fixes summarizer tests * Improve the old-style import allowing module imports (should work) * Try to fix the docs * Remove dedicated KnowledgeGraph page from autodocs * Remove dedicated GraphRetriever page from autodocs * Fix generate_docstrings.sh with an updated list of yaml files to look for * Fix some more modules in the docs * Fix the document stores docs too * Fix a small issue on Tutorial14 * Add latest docstring and tutorial changes * Add deprecation warning to old-style imports * Remove stray folder and import Dict into dense.py * Change import path for MLFlowLogger * Add old loggers path to the import path aliases * Fix debug output of convert_ipynb.py * Fix circular import on BaseRetriever * Missed one merge block * re-run tutorial 5 * Fix imports in tutorial 5 * Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base * Add latest docstring and tutorial changes * Fix typo in utils __init__ * Fix a few more imports * Fix benchmarks too * New-style imports in test_knowledge_graph * Rollback setup.py * Rollback squad_to_dpr too Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-25 15:50:23 +02:00
bogdankostic	51acf779f2	Add TableTextRetriever (#1529 ) * first draft / notes on new primitives * wip label / feedback refactor * rename doc.text -> doc.content. add doc.content_type * add datatype for content * remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field * update converters for . Add warning for empty * renam label.question -> label.query. Allow sorting of Answers. * WIP primitives * update ui/reader for new Answer format * Improve Label. First refactoring of MultiLabel. Adjust eval code * fixed workflow conflict with introducing new one (#1472) * Add latest docstring and tutorial changes * make add_eval_data() work again * fix reader formats. WIP fix _extract_docs_and_labels_from_dict * fix test reader * Add latest docstring and tutorial changes * fix another test case for reader * fix mypy in farm reader.eval() * fix mypy in farm reader.eval() * WIP ORM refactor * Add latest docstring and tutorial changes * fix mypy weaviate * make label and multilabel dataclasses * bump mypy env in CI to python 3.8 * WIP refactor Label ORM * WIP refactor Label ORM * simplify tests for individual doc stores * WIP refactoring markers of tests * test alternative approach for tests with existing parametrization * WIP refactor ORMs * fix skip logic of already parametrized tests * fix weaviate behaviour in tests - not parametrizing it in our general test cases. * Add latest docstring and tutorial changes * fix some tests * remove sql from document_store_types * fix markers for generator and pipeline test * remove inmemory marker * remove unneeded elasticsearch markers * add dataclasses-json dependency. adjust ORM to just store JSON repr * ignore type as dataclasses_json seems to miss functionality here * update readme and contributing.md * update contributing * adjust example * fix duplicate doc handling for custom index * Add latest docstring and tutorial changes * fix some ORM issues. fix get_all_labels_aggregated. * update drop flags where get_all_labels_aggregated() was used before * Add latest docstring and tutorial changes * add to_json(). add + fix tests * fix no_answer handling in label / multilabel * fix duplicate docs in memory doc store. change primary key for sql doc table * fix mypy issues * fix mypy issues * haystack/retriever/base.py * fix test_write_document_meta[elastic] * fix test_elasticsearch_custom_fields * fix test_labels[elastic] * fix crawler * fix converter * fix docx converter * fix preprocessor * fix test_utils * fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations * Add latest docstring and tutorial changes * fix crawler test. fix ocrconverter attribute * fix test_elasticsearch_custom_query * fix generator pipeline * fix ocr converter * fix ragenerator * Add latest docstring and tutorial changes * fix test_load_and_save_yaml for elasticsearch * fixes for pipeline tests * fix faq pipeline * fix pipeline tests * Add latest docstring and tutorial changes * Add MultimodalRetriever * Add latest docstring and tutorial changes * fix weaviate * Add latest docstring and tutorial changes * trigger CI * satisfy mypy * Add latest docstring and tutorial changes * satisfy mypy * Add latest docstring and tutorial changes * trigger CI * fix question generation test * fix ray. fix Q-generation * fix translator test * satisfy mypy * wip refactor feedback rest api * fix rest api feedback endpoint * fix doc classifier * remove relation of Labels -> Docs in SQL ORM * fix faiss/milvus tests * fix doc classifier test * fix eval test * fixing eval issues * Add latest docstring and tutorial changes * fix mypy * WIP replace dataclasses-json with manual serialization * Add methods to MultimodalRetriever * Add latest docstring and tutorial changes * revert to dataclass-json serialization for now. remove debug prints. * update docstrings * fix extractor. fix Answer Span init * fix api test * keep meta data of answers in reader.run() * fix meta handling * adress review feedback * Add latest docstring and tutorial changes * make document=None for open domain labels * add import * fix print utils * fix rest api * Add methods and tests * Add latest docstring and tutorial changes * Fix mypy * Add latest docstring and tutorial changes * Add type hints and doc strings * Make use of initialize_device_settings * Move serialization of pd.DataFrame to schema.py * Fix mypy * Adapt Document's from_dict method * Update docstrings * Add latest docstring and tutorial changes * Fix mypy * Fix mypy * Fix Document's from_dict method * Fix Document's to_dict method * Change handling of table metadata * Add latest docstring and tutorial changes * Change naming from Multimodal to TableText * Turn off tokenizers_parallelism in retriever tests * Add latest docstring and tutorial changes * Remove turning off tokenizers_parallelism in retriever tests * Adapt convert_es_hit_to_document * Change embed_surrounding_context to embed_meta_fields * Add latest docstring and tutorial changes * Add check if torch.distributed is available * Set n_gpu to 0 in training test * Set HIP_LAUNCH_BLOCKING to 1 * Set HIP_LAUNCH_BLOCKING to "1" * Set use_gpu to False * Use DataParallel only if more than one device * Remove --find-links=https://download.pytorch.org/whl/torch_stable.html Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: Markus Paff <markuspaff.mp@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-25 12:27:02 +02:00
Lalit Pagaria	5dbd899a93	Experimental changes to support Milvus 2.x (#1473 ) * Experimental changes to support Milvus 2.x * Milvus 2.0 need other containers hence adding them * Add latest docstring and tutorial changes * Fixing tests * Correcting use of list collections * correcting connection close * Removing connection close logic * removing flush * using collection instead of connection * fixing describe collection * Fixing insert, query and search based on new signature * Making mypy happy * Fixing one test case * Fixing search and embedding fetch based on newer api * Implementing delete vector id function * Wrapping up final changes * Add latest docstring and tutorial changes * Correcting requirements.txt * removing empty line in requirements.txt * add docstring and exception for delete * add docstring. condition import on env var. raise exception for deletion * fix typo * change delete signature * ignore typing for import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-10-25 10:39:48 +02:00
Julian Risch	6033319cfe	Fix parameter names in tutorial 5 and 12 (#1639 ) * Fix parameter names in tutorial 5 * Update parameters in tutorial notebook * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-22 17:22:51 +02:00
Timo Moeller	6da2c73611	Add nltk download, add folder for file upload (#1633 )	2021-10-22 16:03:33 +02:00
Timo Moeller	9dc125df9d	Bugfix Tutorial 5 parameters, adjust default split length (#1635 ) Bugfix parameters, adjust default split length, add sentencetransformers	2021-10-22 16:03:12 +02:00
Sara Zan	f67b213797	Make EntityExtractor work when loaded from YAML (#1636 ) * Add set_config to EntityExtractor * Import EntityExtractor in pipeline.py, or it won't be properly registered as a subclass	2021-10-22 14:41:26 +02:00
Julian Risch	0aba5ca57d	Update jobs link in readme (#1629 )	2021-10-21 12:10:18 +02:00
Julian Risch	52e1fc991e	Update jobs link to personio (#1611 ) * Update jobs link to personio * Add latest docstring and tutorial changes * Change jobs link to main website * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-21 11:42:32 +02:00
Julian Risch	9de140110f	Use smaller model for one generator test case (#1622 ) * Use smaller model for one generator test case * Reduce max_length of generated sequences in tests	2021-10-20 17:57:15 +02:00
Sara Zan	bb066c0a2c	Fix for the Streamlit demo (was sending parameters to a non-existing node of the pipeline) (#1620 )	2021-10-20 11:55:29 +02:00
Julian Risch	f2a3f95ab6	add note on gpu runtime to tutorial 13 (#1614 ) * add note on gpu runtime to tutorial 13 * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-20 09:55:56 +02:00
Julian Risch	4ed2b90bca	Add delete_labels() except for weaviate doc store (#1604 ) * Add delete_labels() except for weaviate doc store * Add latest docstring and tutorial changes * Add test for delete_labels() * Adapt filter for label deletion to different doc stores in test * Allow delete labels by _id in elasticsearch * Add latest docstring and tutorial changes * Add latest docstring and tutorial changes * re-add bugfix after merge * Add ids as optional parameter * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-19 17:20:28 +02:00
Sara Zan	9722bbf1e1	DPR training: Rename `TransformersAdamW` to `AdamW` (#1613 ) * Rename TransformersAdamW into simply AdamW (probably changed in transformers at some point) * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-19 16:18:30 +02:00
Sara Zan	96c05c34e4	Pipeline node names validation (#1601 ) * Add node names validation * Add tests * Improve test and test that params exists before validating * Fix the REST API * Use minilm-uncased-squad2 instead of roberta-base-squad2 * Use roberta model for test_pipeline.yaml * Turn off TOKENIZERS_PARALLELISM in generator tests (#1605) * Account for non-targeted parameters * Restore previous parameters handling in the rest api Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2021-10-19 15:22:44 +02:00
Malte Pietsch	3a7d029fdd	Fix Opensearch field type (flattened -> nested) (#1609 ) * fix field type flattened -> nested. change default port from 9201 to 9200 * change port in benchmarks	2021-10-19 14:40:53 +02:00
Girish A Koushik	5a6285f23f	Add checkpointing for reader.train() to allow stopping + resuming training (#1554 ) * adding create checkpoint feature for train function in farm reader * added arguments for create_or_load_checkpoint function * accessing class method inside Trainer class * added default value for checkpoint_root_dir and checkpoint_every, checkpoints_to_keep as arguments for reader.train() * change in default value for checkpoint_root_dir and checkpoint_every * update docstring and add Path conversion Co-authored-by: girish.koushik <girish.koushik@diatoz.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-10-19 12:36:32 +02:00
Sara Zan	575e64333c	Delete documents by ID in all document stores (#1606 ) * Modify BaseDocumentStore.delete_documents() signature, implement ElasticSearch, and add tests * Add implementation for InMemory * Implement for SQL, FAISS and Milvus too * Add tests for faiss and milvus * Fix delete_all_documents * Implement deletion by ID for weaviate Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com> Co-authored-by: prafgup <prafulgupta6@gmail.com> Co-authored-by: ankh6 <andynzemokalumu@live.be>	2021-10-19 12:30:15 +02:00
Malte Pietsch	eb95f0e8aa	Add more flexible options for model downloads (Proxies, resume_download, local_files_only...) (#1256 ) * allow passing more options for model/tokenizer download from remote * temporarily change dependency to current farm master * Add latest docstring and tutorial changes * add kwargs * add docstrings * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-18 15:47:36 +02:00
Malte Pietsch	3d58e81b5e	Switch from dataclass to pydantic dataclass & Fix Swagger API Docs (#1598 ) * test pydantic dataclasses * Add latest docstring and tutorial changes * enable pydantic mypy plugin * switch to pydentic dataclasses and implement custom to_json from_json * clean up Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-18 14:38:14 +02:00
Malte Pietsch	3a4b3cd59d	Update CONTRIBUTING.md	2021-10-18 09:14:03 +02:00
bogdankostic	655d721371	Add Table Reader (#1446 ) * first draft / notes on new primitives * wip label / feedback refactor * rename doc.text -> doc.content. add doc.content_type * add datatype for content * remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field * update converters for . Add warning for empty * Add first draft of TableReader * renam label.question -> label.query. Allow sorting of Answers. * Add calculation of answer scores * WIP primitives * Adapt input and output to new primitives * Add doc strings * Add tests * update ui/reader for new Answer format * Improve Label. First refactoring of MultiLabel. Adjust eval code * fixed workflow conflict with introducing new one (#1472) * Add latest docstring and tutorial changes * make add_eval_data() work again * fix reader formats. WIP fix _extract_docs_and_labels_from_dict * fix test reader * Add latest docstring and tutorial changes * fix another test case for reader * fix mypy in farm reader.eval() * fix mypy in farm reader.eval() * WIP ORM refactor * Add latest docstring and tutorial changes * fix mypy weaviate * make label and multilabel dataclasses * bump mypy env in CI to python 3.8 * WIP refactor Label ORM * WIP refactor Label ORM * simplify tests for individual doc stores * WIP refactoring markers of tests * test alternative approach for tests with existing parametrization * WIP refactor ORMs * fix skip logic of already parametrized tests * fix weaviate behaviour in tests - not parametrizing it in our general test cases. * Add latest docstring and tutorial changes * fix some tests * remove sql from document_store_types * fix markers for generator and pipeline test * remove inmemory marker * remove unneeded elasticsearch markers * add dataclasses-json dependency. adjust ORM to just store JSON repr * ignore type as dataclasses_json seems to miss functionality here * update readme and contributing.md * update contributing * adjust example * fix duplicate doc handling for custom index * Add latest docstring and tutorial changes * fix some ORM issues. fix get_all_labels_aggregated. * update drop flags where get_all_labels_aggregated() was used before * Add latest docstring and tutorial changes * add to_json(). add + fix tests * fix no_answer handling in label / multilabel * fix duplicate docs in memory doc store. change primary key for sql doc table * fix mypy issues * fix mypy issues * haystack/retriever/base.py * fix test_write_document_meta[elastic] * fix test_elasticsearch_custom_fields * fix test_labels[elastic] * fix crawler * fix converter * fix docx converter * fix preprocessor * fix test_utils * fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations * Add latest docstring and tutorial changes * fix crawler test. fix ocrconverter attribute * fix test_elasticsearch_custom_query * fix generator pipeline * fix ocr converter * fix ragenerator * Add latest docstring and tutorial changes * fix test_load_and_save_yaml for elasticsearch * fixes for pipeline tests * fix faq pipeline * fix pipeline tests * Add latest docstring and tutorial changes * fix weaviate * Add latest docstring and tutorial changes * trigger CI * satisfy mypy * Add latest docstring and tutorial changes * satisfy mypy * Add latest docstring and tutorial changes * trigger CI * fix question generation test * fix ray. fix Q-generation * fix translator test * satisfy mypy * wip refactor feedback rest api * fix rest api feedback endpoint * fix doc classifier * remove relation of Labels -> Docs in SQL ORM * fix faiss/milvus tests * fix doc classifier test * fix eval test * fixing eval issues * Add latest docstring and tutorial changes * fix mypy * WIP replace dataclasses-json with manual serialization * Add latest docstring and tutorial changes * revert to dataclass-json serialization for now. remove debug prints. * update docstrings * fix extractor. fix Answer Span init * fix api test * Adapt answer format * Add latest docstring and tutorial changes * keep meta data of answers in reader.run() * Fix mypy * fix meta handling * adress review feedback * Add latest docstring and tutorial changes * Allow inference on GPU * Remove automatic aggregation * Add automatic aggregation * Add latest docstring and tutorial changes * Add torch-scatter dependency * Add wheel to torch-scatter dependency * Fix requirements * Fix requirements * Fix requirements * Adapt setup.py to allow for wheels * Fix requirements * Fix requirements * Add type hints and code snippet * Add latest docstring and tutorial changes Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai> Co-authored-by: Markus Paff <markuspaff.mp@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-15 16:34:48 +02:00
Julian Risch	5ec29a5283	Limit generator tests to memory doc store; split pipeline tests (#1602 ) * Limit generator tests to memory doc store; split pipeline tests * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-15 15:37:46 +02:00
CandiceYu8	5cfdabda2c	[fix] MySQL connection 'check_same_thread' error (#1585 ) * [fix] sql mysql connection 'check_same_thread' error * adjust sql connection if-block logic	2021-10-15 10:29:36 +02:00
Malte Pietsch	451e51a224	Update code snippet in readme	2021-10-14 18:15:20 +02:00
ju-gu	bd823c9a6f	Update Crawler documentation (#1588 ) Typo in crawling the documentation website.	2021-10-14 12:24:36 +02:00
Malte Pietsch	99c8046367	Fix Tutorials (#1594 ) * fix response format of DocumentSearchPipeline * Add latest docstring and tutorial changes * fix typos * change prints in tutorial 4 * Add latest docstring and tutorial changes * fix tutorial 13 * Add latest docstring and tutorial changes * remove unused import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-14 11:49:35 +02:00
Malte Pietsch	d0b71d39e6	adjust startup sequence in docker compose	2021-10-13 19:43:58 +02:00
Malte Pietsch	caba590576	Fix answer format in ui (#1591 ) * fix answer format in ui * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-13 16:48:33 +02:00
Malte Pietsch	34de811594	make farm logging less verbose	2021-10-13 14:45:54 +02:00
Malte Pietsch	82c2cdf7cd	Merge branch 'master' of github.com:deepset-ai/haystack	2021-10-13 14:45:17 +02:00
Malte Pietsch	db2b5d913b	Fix param in tutorial 8	2021-10-13 14:45:09 +02:00
Malte Pietsch	4a6c9302b3	Redesign primitives - `Document`, `Answer`, `Label` (#1398 ) * first draft / notes on new primitives * wip label / feedback refactor * rename doc.text -> doc.content. add doc.content_type * add datatype for content * remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field * update converters for . Add warning for empty * renam label.question -> label.query. Allow sorting of Answers. * WIP primitives * update ui/reader for new Answer format * Improve Label. First refactoring of MultiLabel. Adjust eval code * fixed workflow conflict with introducing new one (#1472) * Add latest docstring and tutorial changes * make add_eval_data() work again * fix reader formats. WIP fix _extract_docs_and_labels_from_dict * fix test reader * Add latest docstring and tutorial changes * fix another test case for reader * fix mypy in farm reader.eval() * fix mypy in farm reader.eval() * WIP ORM refactor * Add latest docstring and tutorial changes * fix mypy weaviate * make label and multilabel dataclasses * bump mypy env in CI to python 3.8 * WIP refactor Label ORM * WIP refactor Label ORM * simplify tests for individual doc stores * WIP refactoring markers of tests * test alternative approach for tests with existing parametrization * WIP refactor ORMs * fix skip logic of already parametrized tests * fix weaviate behaviour in tests - not parametrizing it in our general test cases. * Add latest docstring and tutorial changes * fix some tests * remove sql from document_store_types * fix markers for generator and pipeline test * remove inmemory marker * remove unneeded elasticsearch markers * add dataclasses-json dependency. adjust ORM to just store JSON repr * ignore type as dataclasses_json seems to miss functionality here * update readme and contributing.md * update contributing * adjust example * fix duplicate doc handling for custom index * Add latest docstring and tutorial changes * fix some ORM issues. fix get_all_labels_aggregated. * update drop flags where get_all_labels_aggregated() was used before * Add latest docstring and tutorial changes * add to_json(). add + fix tests * fix no_answer handling in label / multilabel * fix duplicate docs in memory doc store. change primary key for sql doc table * fix mypy issues * fix mypy issues * haystack/retriever/base.py * fix test_write_document_meta[elastic] * fix test_elasticsearch_custom_fields * fix test_labels[elastic] * fix crawler * fix converter * fix docx converter * fix preprocessor * fix test_utils * fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations * Add latest docstring and tutorial changes * fix crawler test. fix ocrconverter attribute * fix test_elasticsearch_custom_query * fix generator pipeline * fix ocr converter * fix ragenerator * Add latest docstring and tutorial changes * fix test_load_and_save_yaml for elasticsearch * fixes for pipeline tests * fix faq pipeline * fix pipeline tests * Add latest docstring and tutorial changes * fix weaviate * Add latest docstring and tutorial changes * trigger CI * satisfy mypy * Add latest docstring and tutorial changes * satisfy mypy * Add latest docstring and tutorial changes * trigger CI * fix question generation test * fix ray. fix Q-generation * fix translator test * satisfy mypy * wip refactor feedback rest api * fix rest api feedback endpoint * fix doc classifier * remove relation of Labels -> Docs in SQL ORM * fix faiss/milvus tests * fix doc classifier test * fix eval test * fixing eval issues * Add latest docstring and tutorial changes * fix mypy * WIP replace dataclasses-json with manual serialization * Add latest docstring and tutorial changes * revert to dataclass-json serialization for now. remove debug prints. * update docstrings * fix extractor. fix Answer Span init * fix api test * keep meta data of answers in reader.run() * fix meta handling * adress review feedback * Add latest docstring and tutorial changes * make document=None for open domain labels * add import * fix print utils * fix rest api * adress review feedback * Add latest docstring and tutorial changes * fix mypy Co-authored-by: Markus Paff <markuspaff.mp@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-13 14:23:23 +02:00
Malte Pietsch	9650f7aed1	Add `debug` and `debug_logs` params to standard pipelines (#1586 ) * add debug and debug_logs to standard pipelines * Add latest docstring and tutorial changes * fix params Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-12 16:00:48 +02:00
Sara Zan	6354528336	Add `/documents/get_by_filters` endpoint (#1580 ) * Add endpoint to get documents by filter * Add test for /documents/get_by_filter and extend the delete documents test * Add rest_api/file-upload to .gitignore * Make sure the document store is empty for each test * Improve docstrings of delete_documents_by_filters and get_documents_by_filters Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-12 10:53:54 +02:00
Malte Pietsch	bc7167a96c	Fix name	2021-10-12 10:22:41 +02:00
Sara Zan	25d76f508d	Create EntityExtractor (#1573 ) * Create extractor/entity.py * Aggregate NER words into entities * Support indexing * Add doc strings * Add utility for printing * Update signature of run() to match BaseComponent * Add test * Modify simplify_ner_for_qa to return the dictionary and add its test Co-authored-by: brandenchan <brandenchan@icloud.com>	2021-10-11 11:04:11 +02:00
Markus Sagen	69a0c9f2ed	Clarify docs for PDF conversion, languages and encodings (#1570 ) * Clarify PDF conversion, languages and encodings The parameter name `valid_languages` may be a bit miss-leading from reading only the tutorials. Users may, incorrectly assume that it enforces that the conversions only works for those languages, then it's more of a check. - Provided clarifications in the tutorials to highlight what valid_languages does and that changing the encoding may give better results for their language of choice - Updated the command for `pdftotext` to the correct one * Allow encodings for `convert_files_to_dicts` - Set option of passing encoding to the converters. Trying even for some Latin1 languages, the converter does not do it in a good way. Potential issues is that the encoding defaults to None, which is default for the other converters, but not for the PDFToTextConverter. Could add a check and change the ending to Latin1 for pdf if set to None. Was considering adding it to *kwargs, but since it may be a commonly used feature to be documented, I added it as a keyword argument instead. Would love to hear your input and feedback on in. Set back PDF default encoding * Update documentation	2021-10-11 09:30:12 +02:00
Muhammad Hamdan	dbb32c4f79	Adding TfidfRetriever to __init__.py of the retriever package (#1575 ) Adding TfidfRetriever to __init__.py of the retriever package, so people can import it like from haystack.retriever import TfidfRetriever.	2021-10-11 08:05:41 +02:00
Malte Pietsch	38652dd4dd	Enable GPU usage for QuestionGenerator (#1571 ) * enable GPU usage for question generator * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-10-08 12:17:48 +02:00
Sara Zan	54947cb840	Return intermediate nodes output in pipelines (#1558 ) * First rough implementation * Add a flag to dump the debug logs to the console as well * Typing run() and _dispatch_run() * Allow debug and debug_logs to be passed as arguments of run() * Avoid overwriting _debug, later we might want to store other objects in it * Put logs under a separate key of the _debug dictionary and add input and output of the node alongside it * Introduce global arguments for pipeline.run() that get applied to every node when defined * Change default values of debug variables to None, otherwise their default would override the params values * Remove a potential infinite recursion on the overridden __getattr__ * Do not append the output of the last node in the _debug key, it causes infinite recursion * Add tests * Move the input/output collection into _dispatch_run to gather only relevant info * Add partial Pipeline.run() docstring * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-10-07 22:13:25 +02:00
Vladimir Blagojevic	72168eddaf	Add BatchEncoding flatten (#1562 ) * Add BatchEncoding flatten * Rename BatchEncoding flatten to flatten_rename * Unit test for BatchEncoding flatten_rename	2021-10-07 15:29:57 +02:00

1 2 3 4 5 ...

913 Commits