29 Commits

Author SHA1 Message Date
Sara Zan
13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
bogdankostic
655d721371
Add Table Reader (#1446)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* Add first draft of TableReader

* renam label.question -> label.query. Allow sorting of Answers.

* Add calculation of answer scores

* WIP primitives

* Adapt input and output to new primitives

* Add doc strings

* Add tests

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* Adapt answer format

* Add latest docstring and tutorial changes

* keep meta data of answers in reader.run()

* Fix mypy

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* Allow inference on GPU

* Remove automatic aggregation

* Add automatic aggregation

* Add latest docstring and tutorial changes

* Add torch-scatter dependency

* Add wheel to torch-scatter dependency

* Fix requirements

* Fix requirements

* Fix requirements

* Adapt setup.py to allow for wheels

* Fix requirements

* Fix requirements

* Add type hints and code snippet

* Add latest docstring and tutorial changes

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-15 16:34:48 +02:00
Ikram Ali
d835a9cdc5
[setup] version tag added to Haystack fix #1175 (#1216) 2021-06-22 09:43:26 +02:00
Julian Risch
9e4d7bf9be
Increase Haystack version to 0.9.0 (#1215) 2021-06-21 18:39:00 +02:00
Malte Pietsch
022f8586f6
Remove Python 3.6 support (#1059)
* Remove Python 3.6 support

* change cache key for CI
2021-06-01 15:24:44 +02:00
oryx1729
bba1d80aef Update Haystack version 2021-04-13 16:31:19 +02:00
Malte Pietsch
50815421b0 bump haystack version 2021-01-21 16:02:33 +01:00
Malte Pietsch
5b817387c2 Bump version to 0.6.0 2020-12-17 06:31:22 +01:00
Malte Pietsch
f94603cbe4
Bump haystack version (#559) 2020-11-06 09:53:47 +01:00
Malte Pietsch
f0969d8310
Update setup.py 2020-11-02 20:15:10 +01:00
Malte Pietsch
0c5750fae0 Bump version to 0.4.0 2020-09-18 17:12:29 +02:00
Malte Pietsch
d821e8d260
Bump FARM version to 0.4.7 (#340) 2020-09-04 17:29:14 +02:00
Malte Pietsch
d2d048c9fa Upgrade version number to 0.3.0 2020-07-16 13:21:00 +02:00
Malte Pietsch
eb658d308e Upgrade version to 0.2.2 2020-07-15 17:07:29 +02:00
Tanay Soni
54e85e586e
Fix for installing PyTorch on Windows OS (#159) 2020-06-18 17:43:38 +02:00
Tanay Soni
b4842f2cfb Update version 2020-05-05 15:07:44 +02:00
Tanay Soni
ed133010c6 Upgrade version 2020-05-05 10:45:17 +02:00
Tanay Soni
07df974880 Update FARM version 2020-04-20 15:28:10 +02:00
Malte Pietsch
c37a685e7f
Update FARM version 2020-04-17 15:55:26 +02:00
Malte Pietsch
21d5a42f7e
Update FARM version 2020-04-17 15:36:24 +02:00
Malte Pietsch
76c5c1d6aa
Improve deployment of REST API (Configs, logging, minor bugs) (#40)
* remove env variables from dockerfiles

* add more config options to rest api. make fields optional. change to elasticsearch as default

* skip reader if retriever doesn't return anything

* add more config params to farm reader. fix top_k_per_sample

* update FARM version
2020-03-18 12:26:13 +01:00
Malte Pietsch
eaf42a8c21 upgrade FARM version 2020-02-28 18:23:15 +01:00
Malte Pietsch
041f832eee update FARM version 2020-02-27 12:18:40 +01:00
Malte Pietsch
a0293cc996 update farm version 2020-01-23 17:31:28 +01:00
Malte Pietsch
faef7f70d4 fix setup.py for install from git commit 2020-01-13 19:38:37 +01:00
Tanay Soni
962fb8ffe4 Update version 2019-11-27 18:13:40 +01:00
Tanay Soni
84ce175afe Fix filename for long_description 2019-11-27 17:18:05 +01:00
Tanay Soni
14b8eeb936 Update package name 2019-11-27 16:17:45 +01:00
Malte Pietsch
33f6a77800 add setup.py 2019-11-27 14:02:23 +01:00