Malte Pietsch
3d58e81b5e
Switch from dataclass to pydantic dataclass & Fix Swagger API Docs ( #1598 )
...
* test pydantic dataclasses
* Add latest docstring and tutorial changes
* enable pydantic mypy plugin
* switch to pydentic dataclasses and implement custom to_json from_json
* clean up
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-18 14:38:14 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document
, Answer
, Label
( #1398 )
...
* first draft / notes on new primitives
* wip label / feedback refactor
* rename doc.text -> doc.content. add doc.content_type
* add datatype for content
* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field
* update converters for . Add warning for empty
* renam label.question -> label.query. Allow sorting of Answers.
* WIP primitives
* update ui/reader for new Answer format
* Improve Label. First refactoring of MultiLabel. Adjust eval code
* fixed workflow conflict with introducing new one (#1472 )
* Add latest docstring and tutorial changes
* make add_eval_data() work again
* fix reader formats. WIP fix _extract_docs_and_labels_from_dict
* fix test reader
* Add latest docstring and tutorial changes
* fix another test case for reader
* fix mypy in farm reader.eval()
* fix mypy in farm reader.eval()
* WIP ORM refactor
* Add latest docstring and tutorial changes
* fix mypy weaviate
* make label and multilabel dataclasses
* bump mypy env in CI to python 3.8
* WIP refactor Label ORM
* WIP refactor Label ORM
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* WIP refactor ORMs
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* add dataclasses-json dependency. adjust ORM to just store JSON repr
* ignore type as dataclasses_json seems to miss functionality here
* update readme and contributing.md
* update contributing
* adjust example
* fix duplicate doc handling for custom index
* Add latest docstring and tutorial changes
* fix some ORM issues. fix get_all_labels_aggregated.
* update drop flags where get_all_labels_aggregated() was used before
* Add latest docstring and tutorial changes
* add to_json(). add + fix tests
* fix no_answer handling in label / multilabel
* fix duplicate docs in memory doc store. change primary key for sql doc table
* fix mypy issues
* fix mypy issues
* haystack/retriever/base.py
* fix test_write_document_meta[elastic]
* fix test_elasticsearch_custom_fields
* fix test_labels[elastic]
* fix crawler
* fix converter
* fix docx converter
* fix preprocessor
* fix test_utils
* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations
* Add latest docstring and tutorial changes
* fix crawler test. fix ocrconverter attribute
* fix test_elasticsearch_custom_query
* fix generator pipeline
* fix ocr converter
* fix ragenerator
* Add latest docstring and tutorial changes
* fix test_load_and_save_yaml for elasticsearch
* fixes for pipeline tests
* fix faq pipeline
* fix pipeline tests
* Add latest docstring and tutorial changes
* fix weaviate
* Add latest docstring and tutorial changes
* trigger CI
* satisfy mypy
* Add latest docstring and tutorial changes
* satisfy mypy
* Add latest docstring and tutorial changes
* trigger CI
* fix question generation test
* fix ray. fix Q-generation
* fix translator test
* satisfy mypy
* wip refactor feedback rest api
* fix rest api feedback endpoint
* fix doc classifier
* remove relation of Labels -> Docs in SQL ORM
* fix faiss/milvus tests
* fix doc classifier test
* fix eval test
* fixing eval issues
* Add latest docstring and tutorial changes
* fix mypy
* WIP replace dataclasses-json with manual serialization
* Add latest docstring and tutorial changes
* revert to dataclass-json serialization for now. remove debug prints.
* update docstrings
* fix extractor. fix Answer Span init
* fix api test
* keep meta data of answers in reader.run()
* fix meta handling
* adress review feedback
* Add latest docstring and tutorial changes
* make document=None for open domain labels
* add import
* fix print utils
* fix rest api
* adress review feedback
* Add latest docstring and tutorial changes
* fix mypy
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Sara Zan
6354528336
Add /documents/get_by_filters
endpoint ( #1580 )
...
* Add endpoint to get documents by filter
* Add test for /documents/get_by_filter and extend the delete documents test
* Add rest_api/file-upload to .gitignore
* Make sure the document store is empty for each test
* Improve docstrings of delete_documents_by_filters and get_documents_by_filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 10:53:54 +02:00
Sara Zan
3539e6b041
Fix circular import in the REST API ( #1556 )
...
* Fix circular import in the REST API
* remove unneeded import in test
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-04 21:18:23 +02:00
Sara Zan
af4a44fcbd
WIP Add rest api endpoint to delete documents by filter ( #1546 )
...
* Add rest api endpoint to delete documents by filter.
* Remove parametrization of rest api tests
* Make the paths in rest_api/config.py absolute
* Fix path to pipelines.yaml
* Restructuring test_rest_api.py to be able to test only my endpoint (and to make the suite more structured)
* Convert DELETE /documents into POST /documents/delete_by_filters
Co-authored by: sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>
2021-10-04 11:21:00 +02:00
Sara Zan
2de5385ac2
Add "API is loading" message in the UI ( #1493 )
...
* Create the /initialized endpoint
* Now showing an error message if the connection fails, and a 'Haystack is loading' message while workers are starting up
* Improve the appearance of the various messages
* Newline at the end of file
2021-09-27 16:40:25 +02:00
oryx1729
3deff26b60
Fix Search REST API when filters are None ( #1431 )
2021-09-10 14:47:34 +02:00
oryx1729
1f859694f1
Add support for Dense Retrievers in REST API Indexing Pipeline ( #1430 )
2021-09-10 11:53:32 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components ( #1321 )
2021-09-10 11:41:16 +02:00
Malte Pietsch
2a226daac4
Add simple docs2answer node to allow FAQ style QA / Doc search in API ( #1361 )
...
* minimal docs2answer node
* enable logs again
2021-08-20 17:01:55 +02:00
Ikram Ali
29e140196b
[pipeline] Allow for batch indexing when using Pipelines fix #1168 ( #1231 )
...
* [pipeline] Allow for batch indexing when using Pipelines fix #1168
* [pipeline] Test case fixed fix #1168
* [file_converter] Path.suffix updated #1168
* [file_converter] meta can be one of these three cases:
A single dict that is applied to all files
One dict for each file being converted
None #1168
* [file_converter] mypy error fixed.
* [file_converter] mypy error fixed.
* [rest_api] batch file upload introduced in indexing API.
* [test_case] Test_api file upload parameter name updated.
* [ui] Streamlit file upload parameter updated.
2021-06-30 14:13:46 +02:00
Guillim
73a4f9825a
Add env var CONCURRENT_REQUEST_PER_WORKER ( #1235 )
...
* we create an env var `CONCURRENT_REQUEST_PER_WORKER` following your naming convention, (I came a few commit backwards to find the original name)
* default to 4
2021-06-29 07:44:25 +02:00
Malte Pietsch
2c964db62d
Relax typing for meta data in REST API ( #1224 )
2021-06-24 12:34:42 +02:00
Malte Pietsch
2caeea000e
Small UI and REST API fixes ( #1223 )
...
* small fixes
* change default question
2021-06-24 09:53:08 +02:00
oryx1729
afee4f36ce
Add scaffold for defining custom components for Pipelines ( #1205 )
2021-06-23 12:01:54 +02:00
Bhadresh Savani
37a72d2f45
Add File Upload Functionality in UI ( #995 )
2021-04-30 10:46:30 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines ( #922 )
2021-04-07 17:53:32 +02:00
Malte Pietsch
0eaae3c0dd
Fix UI when API returns fewer answers than expected ( #828 )
...
* fix ui for few answers from api. add top_k_per_sample env
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-15 14:27:17 +01:00
Malte Pietsch
6798192d40
Add API endpoint to export accuracy metrics from user feedback + created_at timestamp ( #803 )
...
* WIP feedback metrics
* fix filters and zero division
* add created_at and model_name fields to labels
* add created_at value
* remove debug log level
* fix attribute init
* move timestamp creation down to docstore / db level
* fix import
2021-02-15 10:48:59 +01:00
Tanay Soni
f95b70df38
Fix file upload API ( #808 )
2021-02-05 12:17:38 +01:00
Malte Pietsch
e9b5439b00
Rename label id field for elastic & add UPDATE_EXISTING_DOCUMENTS to API config ( #728 )
...
* rename label id field for elastic
* add UPDATE_EXISTING_DOCUMENTS param to API config
2021-01-12 13:00:56 +01:00
Malte Pietsch
fcc052b554
Pass custom label index name in api config ( #724 )
2021-01-11 12:24:09 +01:00
Guillim
65cf9547d2
Allow setting return_no_answers for TransformersReader in REST API (SQuAD 1.0 format) ( #609 )
...
* Update config.py
* new option
Allow a new option from the settings : tell is a reader model can return a "no answer" like SQuAD2.0 models, or if it's only a SQuAD1.0-like model, always giving an answer.
2020-11-20 14:09:39 +01:00
Lalit Pagaria
23f1058b90
Fixing defaults in config for rest_api ( #583 )
...
* Fixing defaults configs for rest_apis
* Reverting change to VALID_LANGUAGES
* Casting EMBEDDING_DIM as int
2020-11-16 06:51:27 +01:00
Tanay Soni
acd088808b
Allow list of filter values in REST API ( #568 )
2020-11-09 20:41:53 +01:00
Malte Pietsch
46fac41b54
Allow configuration of log level in REST API via ENV ( #541 )
...
* configure log level via env. adjust debug messages
* pin faiss version
2020-11-04 09:54:02 +01:00
Lalit Pagaria
63c12371b9
Change arg "model" to "model_name_or_path" in TransformersReader ( #510 )
...
* Consistent parameter naming for TransformersReader along with removing unused imports as well.
* Addressing review comments
2020-10-21 17:15:35 +02:00
Malte Pietsch
4a77dc7a02
Allow null filter value in api ( #497 )
2020-10-16 18:44:15 +02:00
Lalit Pagaria
b9da789475
Add Elasticsearch Query DSL compliant Query API ( #471 )
2020-10-16 13:25:31 +02:00
Malte Pietsch
5555274170
Make creation of label index optional in feedback and file_upload api
2020-10-15 19:03:58 +02:00
Malte Pietsch
bdbd1b323b
Add create_index and similarity metric to api config ( #493 )
...
* make creation of label index optional
* add params for rest api
* reset tutorial flag
2020-10-15 18:41:36 +02:00
Tanay Soni
3399fc784d
Refactor file converter interface ( #393 )
2020-09-18 10:42:13 +02:00
Tanay Soni
03fa4a8740
Exclude embedding fields from the REST API ( #390 )
2020-09-17 14:37:01 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) ( #379 )
...
* rename database to documentstore
* move document, label, multilabel to haystack/schema.py
* rename documentstore -> document_store
* split indexing modules -> file_converter + preprocessor
* fix order of imports
* Update tutorial notebooks
* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Karim Jana
c7078a36c0
Custom fields for indexing in ElasticsearchDocumentStore ( #297 )
2020-08-10 11:34:39 +02:00
Karim Jana
89dcfed619
Cast Search REST API logs to JSON ( #290 )
2020-08-06 10:36:56 +02:00
Tanay Soni
723921475f
Make document ids of str type ( #284 )
2020-08-03 16:20:17 +02:00
Malte Pietsch
29a15c0d59
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback ( #243 )
2020-07-31 11:34:06 +02:00
Malte Pietsch
1289cc6fbb
Fix format of /export-doc-qa-feedback to comply with SQuAD ( #241 )
2020-07-16 13:17:45 +02:00
Malte Pietsch
6bed2f509f
Refactor DPR for latest transformers version & change init arg gpu
-> use_gpu
for DPR and EmbeddingRetriever ( #239 )
...
* fix tokenizer warning in latest transformers
* change dpr arg from gpu to use_gpu
* change gpu arg for EmbeddingRetriever
2020-07-16 10:45:01 +02:00
Tanay Soni
5c1a5fe61d
Add dummy retriever for benchmarking / reader-only settings ( #235 )
2020-07-15 17:22:17 +02:00
Guillim
8a616dae75
Adjust Docker and REST API to allow TransformsReader Class ( #180 )
2020-07-07 16:25:36 +02:00
Tanay Soni
ff7e35581b
Add response time in logs ( #201 )
2020-07-07 12:28:41 +02:00
Tanay Soni
68d604d82b
Add response for successful file upload ( #195 )
2020-07-06 17:35:47 +02:00
Malte Pietsch
07ecfb60b9
Dense Passage Retriever (Inference) ( #167 )
2020-06-30 19:05:45 +02:00
Tanay Soni
0e070d0d7c
Create file upload dir if not exists ( #166 )
2020-06-24 15:05:30 +02:00
Tanay Soni
ec433a5ed6
Move out REST API from PyPI package ( #160 )
2020-06-22 12:07:12 +02:00