537 Commits

Author SHA1 Message Date
Silvano Cerza
181e5474e8
ci: Automate OpenAPI specs upload to Readme.io (#4228)
* Remove OpenAPI specs file

* OpenAPI specs are now automatically uploaded when necessary

* Rename openapi workflow
2023-02-22 18:01:18 +01:00
github-actions[bot]
aaa1522c45
Update unstable version and openapi schema (#4205)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2023-02-20 14:57:45 +01:00
Agnieszka Marzec
e16f1c8935
Docs: Add filter to hide entity post processor (#4160)
* Add filter to hide entity post processor

* Add missing space
2023-02-16 16:40:42 +01:00
bogdankostic
27aaa92800
docs: Remove some classes regarding PromptNode from API reference docs (#4132) 2023-02-10 15:56:38 +01:00
Agnieszka Marzec
8135e75139
Add shaper to api docs (#4083) 2023-02-08 12:15:08 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
Massimiliano Pippi
8824f3a10a
re-organize pydoc config files (#4042) 2023-02-03 12:51:10 +01:00
Massimiliano Pippi
76bb105388
chore: remove unneeded files (#4036)
* remove unneeded files

* readme file should stay
2023-02-02 15:38:56 +01:00
tstadel
8002cf92d6
fix: extend schema for prompt node results (#3891)
* extend schema for prompt node results

* extend schema

* update openapi

* fix mypy for test module

* added 1.14 specs

* reverted schema for 1.13

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-01-31 16:31:33 +01:00
Agnieszka Marzec
f6a99b6ebc
Fix: Fix quotation marks (#3973)
* Fix quotation marks

* Fix the order
2023-01-27 13:32:52 +01:00
Agnieszka Marzec
7937ef8995
Add csvconverter to API docs (#3968) 2023-01-27 11:42:22 +01:00
Agnieszka Marzec
88650c9b0a
Add imgtotext api doc (#3966) 2023-01-27 09:07:53 +01:00
Massimiliano Pippi
7f6ed941d4
chore: bump pydoc-markdown version used in the CI (#3955)
* use latest pydoc-markdown

* make the workflow manually actionable

* Apply suggestions from code review

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-01-26 16:58:43 +01:00
github-actions[bot]
d962bc0bc9
Update unstable version and openapi schema (#3924)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-26 01:02:49 +05:30
ZanSara
94f660c56f
feat: store id_hash_keys in Document objects to make documents clonable (#3697)
* store id_hash_keys in Document objects

* fix id_hash_keys calls throughout codebase

* generate schema

* fix es

* fix weaviate

* backward compatible

* openapi schema

* remove unused deprecation warning

* remove unused imports

* openapi

* unused var

* Apply suggestions from code review

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/schema.py

* Apply suggestions from code review

Co-authored-by: bogdankostic <bogdankostic@web.de>

* Update haystack/schema.py

* review feedback

* trailing spaces

* pylint

* add deprecation test

Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-01-23 15:00:52 +01:00
ZanSara
3ffdb0a9a3
chore: fix all EOF (#3852)
* fix all eof

* fix test

* fix test

* fix test

* typo

* fix sample

* fix sample

* add logs

* fix page_dynamic_result.txt
2023-01-16 12:34:50 +01:00
Sebastian
e84fae2894
Migrating to use native Pytorch AMP (#2827)
* Started making changes to use native Pytorch AMP

* Updated compute_loss functions to use torch.cuda.amp.autocast

* Updating docstrings

* Add use_amp to trainer_checkpoint

* Removed mentions of apex and started to add the necessary warnings

* Removing unused instances of use_amp variable

* Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train

* Make max_query_length optional in FARMReader.train

* Update lg

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-01-05 09:14:28 +01:00
Bilge Yücel
ddba75021a
fix: add additional settings to OpenAPI schema (#3788)
* "proxy-enabled": disable CORS proxy
* "samples-languages": display two languages initially
2022-12-30 16:10:37 +03:00
bogdankostic
36cfd41713
Add newline when generating OpenAPI specs (#3782) 2022-12-29 17:55:43 +01:00
Agnieszka Marzec
b8fff837b4
docs: Add info where the feedback is stored (#3772)
* Add info where the feedback is stored

* Fix misplaced line breaks

* Generate OpenAPI Specs

* Generate OpenAPI Specs

* Apply black

* Generate OpenAPI specs

* Add missing whitespace

Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-12-28 14:46:26 +01:00
Bilge Yücel
86ade4817e
bug: fix the docs rest api reference url (#3775)
* bug: fix the docs rest api reference url

* revert openapi json changes

* remove last line on json files

* Add explanation about `servers` and remove `servers` parameter from FastAPI

* generate openapi schema without empty end line
2022-12-28 12:30:58 +03:00
Agnieszka Marzec
367c63ef1d
Update readme (#3744) 2022-12-22 15:53:48 +01:00
Tuana Celik
fe5e0164e8
chore: adding template for prompt node (#3738) 2022-12-21 20:13:57 +01:00
Stefano Fiorucci
e1401f79b6
refactor: improve Multilabel design (#3658)
* first try and new test

* fix test

* fix unused import

* remove comments

* no more dataclass

* add __eq__ and extend test

* better design from review

* Update schema.py

* fix black

* fix openapi

* fix openapi 2

* new try to fix openapi

* remove newline from openapi json
2022-12-13 10:45:56 +01:00
github-actions[bot]
5405d9d7f8
Update unstable version and openapi schema (#3700)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-12-13 09:59:52 +01:00
Sara Zan
eba518a589
add trailing newlines to make end-of-file-fixer happy (#3699) 2022-12-12 14:42:25 +01:00
github-actions[bot]
af78f8b431
Update unstable version and openapi schema (#3584)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2022-11-16 10:09:40 +01:00
Massimiliano Pippi
0c1de3745d
fix milvus imports (#3576) 2022-11-15 10:58:51 +01:00
Massimiliano Pippi
da6b0dc66f
feat: introduce proposal design process (#3333)
* add RFC process

* migrate old ADR to the new process

* typo

* review comments

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* [skip ci] review feedback

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* [skip ci] leftover

* rename to proposals

* Adjust naming

* Update 2170-pydantic-dataclasses.md

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2022-11-11 12:49:23 +01:00
Stefano Fiorucci
1a60e21137
refactor: simplify Summarizer, add Document Merger (#3452)
* remove generate_single_summary

* update schemas

* remove unused import

* fix mypy

* fix mypy

* test: summarizer doesnt change content

* other test correction

* move test_summarizer_translation to test_extractor_translation

* fix test

* first try for doc merger

* reintroduce and deprecate generate_single_summary

* progress in document merger

* document merger!

* mypy, pylint fixes

* use generator

* added test that will fail in 1.12

* adapt to review

* extended deprecation docstring

* Update test/nodes/test_extractor_translation.py

* Update test/nodes/test_summarizer.py

* Update test/nodes/test_summarizer.py

* black

* documents fixture

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-11-03 16:04:53 +01:00
Sara Zan
8ddeda811a
generate docs for search.engine.py (#3507) 2022-10-31 16:57:39 +01:00
bogdankostic
4fbe80c098
feat: Extraction of headlines in markdown files (#3445)
* Extract headings from markdown files + adapt PreProcessor

* Add tests

* Fix mypy

* Generate JSON schema

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/nodes/file_converter/markdown.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply black

* Add PR feedback

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-10-26 11:57:55 +02:00
Branden Chan
7b15799853
Change slug and title (#3474) 2022-10-25 16:41:27 +01:00
Stefano Fiorucci
54ec13eaf7
refactor: Change no_answer attribute (#3411)
* always run validation

* update schemas

* no_answer as a property. break things!

* forgotten schema

* fix

* update openapi

* removed my unnecessary test

* fix sql document store

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-10-25 13:07:00 +02:00
Branden Chan
03ba07dcb5
docs: Extend utils API docs coverage (#3402)
* Add more utils modules

* Format docstrings

* Incorporate reviewer feedback
2022-10-21 12:51:11 +01:00
Branden Chan
3f956c75f4
Add multimodal retrieval to API docs (#3430) 2022-10-20 15:07:48 +02:00
Massimiliano Pippi
5335e9e4d9
Add new schema for latest unstable (#3415)
* add new schema for latest unstable

* openapi
2022-10-19 13:21:05 +02:00
Sebastian
15a59fd040
feat: Updated EntityExtractor to handle long texts and added better postprocessing (#3154)
* Remove dependence on HuggingFace TokenClassificationPipeline and group all postprocessing functions under one class

* Added copyright notice for HF and deepset to entity file to acknowledge that a lot of the postprocessing parts came from the transformers library.

* Fixed text squishing problem. Added additional unit test for it.

Co-authored-by: ju-gu <julian.gutsch@deepset.ai>
2022-10-17 21:26:44 +02:00
Sara Zan
101d2bc86c
feat: MultiModalRetriever (#2891)
* Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly

* content_types

* Splitting classes into respective folders

* small changes

* Fix EOF

* eof

* black

* API

* EOF

* whitespace

* api

* improve multimodal similarity processor

* tokenizer -> feature extractor

* Making feature vectors come out of the feature extractor in the similarity head

* embed_queries is now self-sufficient

* couple trivial errors

* Implemented separate language model classes for multimodal inference

* Document embedding seems to work

* removing batch_encode_plus, is deprecated anyway

* Realized the base Data2Vec models are not trained on retrieval tasks

* Issue with the generated embeddings

* Add batching

* Try to fit CLIP in

* Stub of CLIP integration

* Retrieval goes through but returns noise only

* Still working on the scores

* Introduce temporary adapter for CLIP models

* Image retrieval now works with sentence-transformers

* Tidying up the code

* Refactoring is now functional

* Add MPNet to the supported sentence transformers models

* Remove unused classes

* pylint

* docs

* docs

* Remove the method renaming

* mpyp first pass

* docs

* tutorial

* schema

* mypy

* Move devices setup into get_model

* more mypy

* mypy

* pylint

* Move a few params in HaystackModel's init

* make feature extractor work with squadprocessor

* fix feature_extractor_kwargs forwarding

* Forgotten part of the fix

* Revert unrelated ES change

* Revert unrelated memdocstore changes

* comment

* Small corrections

* mypy and pylint

* mypy

* typo

* mypy

* Refactor the  call

* mypy

* Do not make FARMReader use the new FeatureExtractor

* mypy

* Detach DPR tests from FeatureExtractor too

* Detach processor tests too

* Add end2end marker

* extract end2end feature extractor tests

* temporary disable feature extraction tests

* Introduce end2end tests for tokenizer tests

* pylint

* Fix model loading from folder in FeatureExtractor

* working o n end2end

* end2end keeps failing

* Restructuring retriever tests

* Restructuring retriever tests

* remove covert_dataset_to_dataloader

* remove comment

* Better check sentence-transformers models

* Use embed_meta_fields properly

* rename passage into document

* Embedding dims can't be found

* Add check for models that support it

* pylint

* Split all retriever tests into suites, running mostly on InMemory only

* fix mypy

* fix tfidf test

* fix weaviate tests

* Parallelize on every docstore

* Fix schema and specify modality in base retriever suite

* tests

* Add first image tests

* remove comment

* Revert to simpler tests

* Update docs/_src/api/api/primitives.md

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/__init__.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* get_args

* mypy

* Update haystack/modeling/model/multimodal/__init__.py

* Update haystack/modeling/model/multimodal/base.py

* Update haystack/modeling/model/multimodal/base.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/sentence_transformers.py

* Update haystack/modeling/model/multimodal/sentence_transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/transformers.py

* Update haystack/modeling/model/multimodal/transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/nodes/retriever/multimodal/retriever.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* mypy

* mypy

* removing more ContentTypes

* more contentypes

* pylint

* add to __init__

* revert end2end workflow for now

* missing integration markers

* Update haystack/nodes/retriever/multimodal/embedder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* review feedback, removing HaystackImageTransformerModel

* review feedback part 2

* mypy & pylint

* mypy

* mypy

* fix multimodal docs also for Pinecone

* add note on internal constants

* Fix pinecone write_documents

* schemas

* keep support for sentence-transformers only

* fix pinecone test

* schemas

* fix pinecone again

* temporarily disable some tests, need to understand if they're still relevant

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-10-17 18:58:35 +02:00
Stefano Fiorucci
b579b9d54a
bug: make ElasticSearchDocumentStore use batch_size in get_documents_by_id (#3166)
* use batch_size

* try to fix git mess

* improve docstrings

* fix
2022-09-26 13:21:59 +02:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine (#3217)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* split PR

* update docs

* manually revert tutorial doc change

* Fix embedding type

* set integration marker correctly

* make BaseDocumentStore.normalize_embedding static

* format

* fix handling of opensearch_faiss param

* fix merge

* add DenseRetriever typing

* organize imports in conftest.py

* organize imports in conftest.py (2)

* fix DenseRetriever import

* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
b10e2c392e
chore: add DenseRetriever abstraction (#3252)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* update docs

* fix imports

* import  DenseRetriever normally

* update docs

* fix deepcopy of documents

* update schema

* Revert "update schema"

This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f.

* fix schema for ci manually
2022-09-21 19:08:54 +02:00
Branden Chan
492a8046d8
docs: sync Haystack API with Readme (#3223)
* First pass at syncing Haystack API with Readme

* Reapply changes

* Regularize slugs

* Regularize slugs

* Regularize slugs

* Set category id and regen

* Trigger workflow

* Delete old md files

* Test sync

* Undo test string

* Incorporate reviewer feedback

* Test on the fly API generation and sync

* Test on the fly API generation and sync

* Test on the fly API generation and sync

* Test on the fly API generation and sync

* Test on the fly API generation and sync

* Change name of pydoc-markdown scripts

* Test on the fly API generation and sync

* Remove version tag

* Test version tag

* Test version tag

* Test version tag

* Revert test docstring

* Revert md file changes

* Revert md file changes

* Revert script naming

* Test on the fly generation and sync

* Adjust for on the fly generation and sync

* Revert test string

* Remove old documentation workflow

* Set workflow to work on main

* Change readme version name
2022-09-21 17:18:34 +02:00
Massimiliano Pippi
8f76d64f6f
chore: bump release number for unstable version (#3251)
* bump version for unstable

* allow generation of rc schemas

* update schemas
2022-09-21 16:58:06 +02:00
Vladimir Blagojevic
938e6fda5b
Classify pipeline's type based on its components (#3132)
* Add pipeline get_type mehod

* Add pipeline uptime

* Add pipeline telemetry event sending

* Send pipeline telemetry once a day (at most)

* Add pipeline invocation counter, change invocation counter logic

* Update allowed telemetry parameters - allow pipeline parameters

* PR review: add unit test
2022-09-21 14:53:42 +02:00
Stefano Fiorucci
89247b804c
refactor: make TransformersDocumentClassifier output consistent between different types of classification (#3224)
* make output consistent

* make output consistent

* added tests for details

* better tests

* Update test_document_classifier.py

* make black happy

* Update test_document_classifier.py

* Update test_document_classifier.py
2022-09-21 13:16:03 +02:00
Tuana Celik
336c144e72
chore: updating colab links in older docs versions (#3250)
* updating colab links to tutorial 1

* remaining tutorials

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-09-20 18:15:29 +02:00
Malte Pietsch
7e79a48540
bug: reactivate benchmarks with quick fixes (#2766)
* quick fix benchmark runs to make them work with current haystack version

* fix minor typo

* update readme. fix minor things to make benchmarks run again

* Update Documentation & Code Style

* fix typo in readme

* update result files for reader and retriever querying

* reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs)

* change default memory allocation back to normal. add note to readme

* add first indexing results

* add memory to docker cmd

* full benchmarks results on commit  c5a2651fcbbeffca06ffa9036b10e62669bcc1b0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-09-20 10:22:08 +02:00
Massimiliano Pippi
9399ddf949
fix pydoc-markdown hook (#3238) 2022-09-19 18:20:35 +02:00
Massimiliano Pippi
8fbccbda82
fix: handle Documents containing dataframes in Multilabel constructor (#3237)
* format

* fix docs
2022-09-19 14:59:20 +02:00