Malte Pietsch
|
4a6c9302b3
|
Redesign primitives - Document , Answer , Label (#1398)
* first draft / notes on new primitives
* wip label / feedback refactor
* rename doc.text -> doc.content. add doc.content_type
* add datatype for content
* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field
* update converters for . Add warning for empty
* renam label.question -> label.query. Allow sorting of Answers.
* WIP primitives
* update ui/reader for new Answer format
* Improve Label. First refactoring of MultiLabel. Adjust eval code
* fixed workflow conflict with introducing new one (#1472)
* Add latest docstring and tutorial changes
* make add_eval_data() work again
* fix reader formats. WIP fix _extract_docs_and_labels_from_dict
* fix test reader
* Add latest docstring and tutorial changes
* fix another test case for reader
* fix mypy in farm reader.eval()
* fix mypy in farm reader.eval()
* WIP ORM refactor
* Add latest docstring and tutorial changes
* fix mypy weaviate
* make label and multilabel dataclasses
* bump mypy env in CI to python 3.8
* WIP refactor Label ORM
* WIP refactor Label ORM
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* WIP refactor ORMs
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* add dataclasses-json dependency. adjust ORM to just store JSON repr
* ignore type as dataclasses_json seems to miss functionality here
* update readme and contributing.md
* update contributing
* adjust example
* fix duplicate doc handling for custom index
* Add latest docstring and tutorial changes
* fix some ORM issues. fix get_all_labels_aggregated.
* update drop flags where get_all_labels_aggregated() was used before
* Add latest docstring and tutorial changes
* add to_json(). add + fix tests
* fix no_answer handling in label / multilabel
* fix duplicate docs in memory doc store. change primary key for sql doc table
* fix mypy issues
* fix mypy issues
* haystack/retriever/base.py
* fix test_write_document_meta[elastic]
* fix test_elasticsearch_custom_fields
* fix test_labels[elastic]
* fix crawler
* fix converter
* fix docx converter
* fix preprocessor
* fix test_utils
* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations
* Add latest docstring and tutorial changes
* fix crawler test. fix ocrconverter attribute
* fix test_elasticsearch_custom_query
* fix generator pipeline
* fix ocr converter
* fix ragenerator
* Add latest docstring and tutorial changes
* fix test_load_and_save_yaml for elasticsearch
* fixes for pipeline tests
* fix faq pipeline
* fix pipeline tests
* Add latest docstring and tutorial changes
* fix weaviate
* Add latest docstring and tutorial changes
* trigger CI
* satisfy mypy
* Add latest docstring and tutorial changes
* satisfy mypy
* Add latest docstring and tutorial changes
* trigger CI
* fix question generation test
* fix ray. fix Q-generation
* fix translator test
* satisfy mypy
* wip refactor feedback rest api
* fix rest api feedback endpoint
* fix doc classifier
* remove relation of Labels -> Docs in SQL ORM
* fix faiss/milvus tests
* fix doc classifier test
* fix eval test
* fixing eval issues
* Add latest docstring and tutorial changes
* fix mypy
* WIP replace dataclasses-json with manual serialization
* Add latest docstring and tutorial changes
* revert to dataclass-json serialization for now. remove debug prints.
* update docstrings
* fix extractor. fix Answer Span init
* fix api test
* keep meta data of answers in reader.run()
* fix meta handling
* adress review feedback
* Add latest docstring and tutorial changes
* make document=None for open domain labels
* add import
* fix print utils
* fix rest api
* adress review feedback
* Add latest docstring and tutorial changes
* fix mypy
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
2021-10-13 14:23:23 +02:00 |
|
oryx1729
|
9dd7c74f4f
|
Refactor communication between Pipeline Components (#1321)
|
2021-09-10 11:41:16 +02:00 |
|
Branden Chan
|
980d88a0f2
|
Update faq model (#1401)
|
2021-09-01 18:39:06 +02:00 |
|
Malte Pietsch
|
be9d19afa5
|
Remove Finder from tutorials (#1329)
|
2021-08-10 11:50:59 +02:00 |
|
Branden Chan
|
783893c3d2
|
Tutorial update (#1166)
* Add header / footer
* Add Milvus example
* Generate md files
* Fix mypy CI
|
2021-06-11 11:09:15 +02:00 |
|
Malte Pietsch
|
e91518ee00
|
Update tutorials (torch versions, ES version, replace Finder with Pipeline) (#814)
* remove manual torch install on colab
* update elasticsearch version everywhere to 7.9.2
* fix FAQPipeline
* update tutorials with new pipelines
* Add latest docstring and tutorial changes
* revert faqpipeline change. fix field names in tutorial 4
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
2021-02-09 14:56:54 +01:00 |
|
Julian Risch
|
3331608e03
|
Adding a guard that prevents the tutorial code from being executed in every subprocess when using multiprocessing on windows (#729)
|
2021-01-13 18:17:54 +01:00 |
|
Branden Chan
|
d8154939fc
|
Scale dot product into probabilities (#667)
* scale dot product
* Add tip in documentation
* Add recommendation boxes
* WIP: Use similarity attribute in all doc stores
* Implement similarity for InMemoryDS
* Add FAISS support
* Clean printout
* Update documentation
* Implement document field map
|
2020-12-11 12:10:24 +01:00 |
|
Tanay Soni
|
db4151bbc0
|
Fix scoring in Elasticsearch for dot product (#517)
|
2020-10-23 17:50:49 +02:00 |
|
Guillim
|
fb5db59590
|
Remove useless line from Tutorial4_FAQ_style_QA (#416)
* Update Tutorial4_FAQ_style_QA.py
Used to be useful when `.apply()` was necessary, but not any longer
* Update Tutorial4_FAQ_style_QA.ipynb
|
2020-09-22 09:01:04 +02:00 |
|
Malte Pietsch
|
271ff30262
|
fix type casting of embeddings for tutorial 4 (#402)
|
2020-09-18 18:10:50 +02:00 |
|
Malte Pietsch
|
9727829cc6
|
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore
* move document, label, multilabel to haystack/schema.py
* rename documentstore -> document_store
* split indexing modules -> file_converter + preprocessor
* fix order of imports
* Update tutorial notebooks
* fix torch version in tutorial 4
|
2020-09-16 18:33:23 +02:00 |
|
Malte Pietsch
|
29a15c0d59
|
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243)
|
2020-07-31 11:34:06 +02:00 |
|
Malte Pietsch
|
5b1be233d0
|
Update Tutorial 4
|
2020-07-17 19:31:00 +02:00 |
|
Malte Pietsch
|
6bed2f509f
|
Refactor DPR for latest transformers version & change init arg gpu -> use_gpu for DPR and EmbeddingRetriever (#239)
* fix tokenizer warning in latest transformers
* change dpr arg from gpu to use_gpu
* change gpu arg for EmbeddingRetriever
|
2020-07-16 10:45:01 +02:00 |
|
Malte Pietsch
|
07ecfb60b9
|
Dense Passage Retriever (Inference) (#167)
|
2020-06-30 19:05:45 +02:00 |
|
Malte Pietsch
|
a431a94b04
|
Add basic tutorial for FAQ-based QA & batch comp. of embeddings (#98)
* Add basic tutorial for FAQ-based QA and switch to bach computation of embeddings
* update readme & haystack version in tutorial
|
2020-05-07 10:19:26 +02:00 |
|