54 Commits

Author SHA1 Message Date
Julian Risch
845905e418
ignore empty filters parameter (#1783)
* ignore empty filters parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-22 09:36:14 +01:00
Sara Zan
d81897535e
Public demo (#1747)
* Queries now run only when pressing RUN. File upload hidden. Question is not sent if the textbox is empty.

* Add latest docstring and tutorial changes

* Tidy up: remove needless state, add comments, fix minor bugs

* Had to add results to the status to avoid some bugs in eval mode

* Added 'credits'

* Add footers, update requirements, some random questions for the evaluation

* Add requested changes

* Temporary rollback the UI to the old GoT dataset

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-19 11:34:32 +01:00
Malte Pietsch
c0892717a0
Fix usage of filters in /query endpoint in REST API (#1774)
* WIP filter refactoring

* fix filter formatting

* remove inplace modification of filters
2021-11-18 18:13:03 +01:00
Malte Pietsch
b28dd823ef
Improve open api spec (#1700)
* improve open api spec

* move to automatic generation of better operationIDs
2021-11-11 09:40:58 +01:00
Julian Risch
c8df4763f8
disable file upload for InMemoryDocStore (#1677) 2021-11-01 10:39:13 +01:00
Sara Zan
13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
Sara Zan
96c05c34e4
Pipeline node names validation (#1601)
* Add node names validation

* Add tests

* Improve test and test that params exists before validating

* Fix the REST API

* Use minilm-uncased-squad2 instead of roberta-base-squad2

* Use roberta model for test_pipeline.yaml

* Turn off TOKENIZERS_PARALLELISM in generator tests (#1605)

* Account for non-targeted parameters

* Restore previous parameters handling in the rest api

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2021-10-19 15:22:44 +02:00
Malte Pietsch
3d58e81b5e
Switch from dataclass to pydantic dataclass & Fix Swagger API Docs (#1598)
* test pydantic dataclasses

* Add latest docstring and tutorial changes

* enable pydantic mypy plugin

* switch to pydentic dataclasses and implement custom to_json from_json

* clean up

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-18 14:38:14 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document, Answer, Label (#1398)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* renam label.question -> label.query. Allow sorting of Answers.

* WIP primitives

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* keep meta data of answers in reader.run()

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* make document=None for open domain labels

* add import

* fix print utils

* fix rest api

* adress review feedback

* Add latest docstring and tutorial changes

* fix mypy

Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Sara Zan
6354528336
Add /documents/get_by_filters endpoint (#1580)
* Add endpoint to get documents by filter

* Add test for /documents/get_by_filter and extend the delete documents test

* Add rest_api/file-upload to .gitignore

* Make sure the document store is empty for each test

* Improve docstrings of delete_documents_by_filters and get_documents_by_filters

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 10:53:54 +02:00
Sara Zan
3539e6b041
Fix circular import in the REST API (#1556)
* Fix circular import in the REST API

* remove unneeded import in test

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-04 21:18:23 +02:00
Sara Zan
af4a44fcbd
WIP Add rest api endpoint to delete documents by filter (#1546)
* Add rest api endpoint to delete documents by filter.

* Remove parametrization of rest api tests

* Make the paths in rest_api/config.py absolute

* Fix path to pipelines.yaml

* Restructuring test_rest_api.py to be able to test only my endpoint (and to make the suite more structured)

* Convert DELETE /documents into POST /documents/delete_by_filters

Co-authored by:  sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>
2021-10-04 11:21:00 +02:00
Sara Zan
2de5385ac2
Add "API is loading" message in the UI (#1493)
* Create the /initialized endpoint

* Now showing an error message if the connection fails, and a 'Haystack is loading' message while workers are starting up

* Improve the appearance of the various messages

* Newline at the end of file
2021-09-27 16:40:25 +02:00
oryx1729
3deff26b60
Fix Search REST API when filters are None (#1431) 2021-09-10 14:47:34 +02:00
oryx1729
1f859694f1
Add support for Dense Retrievers in REST API Indexing Pipeline (#1430) 2021-09-10 11:53:32 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00
Malte Pietsch
2a226daac4
Add simple docs2answer node to allow FAQ style QA / Doc search in API (#1361)
* minimal docs2answer node

* enable logs again
2021-08-20 17:01:55 +02:00
Ikram Ali
29e140196b
[pipeline] Allow for batch indexing when using Pipelines fix #1168 (#1231)
* [pipeline] Allow for batch indexing when using Pipelines fix #1168

* [pipeline] Test case fixed fix #1168

* [file_converter] Path.suffix updated #1168

* [file_converter] meta can be one of these three cases:
                 A single dict that is applied to all files
                 One dict for each file being converted
                 None #1168

* [file_converter] mypy error fixed.

* [file_converter] mypy error fixed.

* [rest_api] batch file upload introduced in indexing API.

* [test_case] Test_api file upload parameter name updated.

* [ui] Streamlit file upload parameter updated.
2021-06-30 14:13:46 +02:00
Guillim
73a4f9825a
Add env var CONCURRENT_REQUEST_PER_WORKER (#1235)
* we create an env var `CONCURRENT_REQUEST_PER_WORKER` following your naming convention, (I came a few commit backwards to find the original name)

* default to 4
2021-06-29 07:44:25 +02:00
Malte Pietsch
2c964db62d
Relax typing for meta data in REST API (#1224) 2021-06-24 12:34:42 +02:00
Malte Pietsch
2caeea000e
Small UI and REST API fixes (#1223)
* small fixes

* change default question
2021-06-24 09:53:08 +02:00
oryx1729
afee4f36ce
Add scaffold for defining custom components for Pipelines (#1205) 2021-06-23 12:01:54 +02:00
Bhadresh Savani
37a72d2f45
Add File Upload Functionality in UI (#995) 2021-04-30 10:46:30 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines (#922) 2021-04-07 17:53:32 +02:00
Malte Pietsch
0eaae3c0dd
Fix UI when API returns fewer answers than expected (#828)
* fix ui for few answers from api. add top_k_per_sample env

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-15 14:27:17 +01:00
Malte Pietsch
6798192d40
Add API endpoint to export accuracy metrics from user feedback + created_at timestamp (#803)
* WIP feedback metrics

* fix filters and zero division

* add created_at and model_name fields to labels

* add created_at value

* remove debug log level

* fix attribute init

* move timestamp creation down to docstore / db level

* fix import
2021-02-15 10:48:59 +01:00
Tanay Soni
f95b70df38
Fix file upload API (#808) 2021-02-05 12:17:38 +01:00
Malte Pietsch
e9b5439b00
Rename label id field for elastic & add UPDATE_EXISTING_DOCUMENTS to API config (#728)
* rename label id field for elastic

* add UPDATE_EXISTING_DOCUMENTS param to API config
2021-01-12 13:00:56 +01:00
Malte Pietsch
fcc052b554
Pass custom label index name in api config (#724) 2021-01-11 12:24:09 +01:00
Guillim
65cf9547d2
Allow setting return_no_answers for TransformersReader in REST API (SQuAD 1.0 format) (#609)
* Update config.py

* new option

Allow a new option from the settings : tell is a reader model can return a "no answer" like SQuAD2.0 models, or if it's only a  SQuAD1.0-like model, always giving an answer.
2020-11-20 14:09:39 +01:00
Lalit Pagaria
23f1058b90
Fixing defaults in config for rest_api (#583)
* Fixing defaults configs for rest_apis

* Reverting change to VALID_LANGUAGES

* Casting EMBEDDING_DIM as int
2020-11-16 06:51:27 +01:00
Tanay Soni
acd088808b
Allow list of filter values in REST API (#568) 2020-11-09 20:41:53 +01:00
Malte Pietsch
46fac41b54
Allow configuration of log level in REST API via ENV (#541)
* configure log level via env. adjust debug messages

* pin faiss version
2020-11-04 09:54:02 +01:00
Lalit Pagaria
63c12371b9
Change arg "model" to "model_name_or_path" in TransformersReader (#510)
* Consistent parameter naming for TransformersReader along with removing unused imports as well.

* Addressing review comments
2020-10-21 17:15:35 +02:00
Malte Pietsch
4a77dc7a02
Allow null filter value in api (#497) 2020-10-16 18:44:15 +02:00
Lalit Pagaria
b9da789475
Add Elasticsearch Query DSL compliant Query API (#471) 2020-10-16 13:25:31 +02:00
Malte Pietsch
5555274170 Make creation of label index optional in feedback and file_upload api 2020-10-15 19:03:58 +02:00
Malte Pietsch
bdbd1b323b
Add create_index and similarity metric to api config (#493)
* make creation of label index optional

* add params for rest api

* reset tutorial flag
2020-10-15 18:41:36 +02:00
Tanay Soni
3399fc784d
Refactor file converter interface (#393) 2020-09-18 10:42:13 +02:00
Tanay Soni
03fa4a8740
Exclude embedding fields from the REST API (#390) 2020-09-17 14:37:01 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore

* move document, label, multilabel to haystack/schema.py

* rename documentstore -> document_store

* split indexing modules -> file_converter + preprocessor

* fix order of imports

* Update tutorial notebooks

* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Karim Jana
c7078a36c0
Custom fields for indexing in ElasticsearchDocumentStore (#297) 2020-08-10 11:34:39 +02:00
Karim Jana
89dcfed619
Cast Search REST API logs to JSON (#290) 2020-08-06 10:36:56 +02:00
Tanay Soni
723921475f
Make document ids of str type (#284) 2020-08-03 16:20:17 +02:00
Malte Pietsch
29a15c0d59
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243) 2020-07-31 11:34:06 +02:00
Malte Pietsch
1289cc6fbb
Fix format of /export-doc-qa-feedback to comply with SQuAD (#241) 2020-07-16 13:17:45 +02:00
Malte Pietsch
6bed2f509f
Refactor DPR for latest transformers version & change init arg gpu -> use_gpu for DPR and EmbeddingRetriever (#239)
* fix tokenizer warning in latest transformers

* change dpr arg from gpu to use_gpu

* change gpu arg for EmbeddingRetriever
2020-07-16 10:45:01 +02:00
Tanay Soni
5c1a5fe61d
Add dummy retriever for benchmarking / reader-only settings (#235) 2020-07-15 17:22:17 +02:00
Guillim
8a616dae75
Adjust Docker and REST API to allow TransformsReader Class (#180) 2020-07-07 16:25:36 +02:00
Tanay Soni
ff7e35581b
Add response time in logs (#201) 2020-07-07 12:28:41 +02:00