1524 Commits

Author SHA1 Message Date
Julian Risch
d0691a4bd5
bug: replace decorator with counter attribute for pipeline event (#3462) 2022-10-26 12:09:04 +02:00
bogdankostic
4fbe80c098
feat: Extraction of headlines in markdown files (#3445)
* Extract headings from markdown files + adapt PreProcessor

* Add tests

* Fix mypy

* Generate JSON schema

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/nodes/file_converter/markdown.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply black

* Add PR feedback

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-10-26 11:57:55 +02:00
Vladimir Blagojevic
5ca96357ff
feat: Add CohereEmbeddingEncoder to EmbeddingRetriever (#3453) 2022-10-25 17:52:29 +02:00
Stefano Fiorucci
54ec13eaf7
refactor: Change no_answer attribute (#3411)
* always run validation

* update schemas

* no_answer as a property. break things!

* forgotten schema

* fix

* update openapi

* removed my unnecessary test

* fix sql document store

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-10-25 13:07:00 +02:00
Mayank Jobanputra
d48577b4e7
bug: removed duplicated meta "name" field addition to content before embedding in update_embeddings workflow (#3368)
* Removed explicit passage formatting by name field

* passing correct input type for embedding the docs

* Updated test, updated similarity scores and added results

* changed expected input to embed method
2022-10-25 14:52:05 +05:30
Sara Zan
cbf44413d8
feat: add __cointains__ to Span (#3446)
* add __contains__

* add tests
2022-10-21 13:58:17 +02:00
Vladimir Blagojevic
8f31228211
feat: Add exponential backoff decorator; apply it to OpenAI requests (#3398) 2022-10-19 17:47:38 +02:00
Ursin Brunner
5fedfb03b0
fix: Fix the error of wrong page numbers when documents contain empty pages. (#3330)
* Fix the error of wrong page numbers when documents contain empty pages.

* Reformat using git hooks.

* Use a more descriptive placeholder
2022-10-18 17:51:02 +02:00
Sebastian
93817f63b4
feat: Speed up integration tests (nodes) (#3408)
* Changed summarizer model to a smaller one (2GB to 500MB) to save on space and speed up the tests.

* Removed google pegasus from cache
2022-10-18 16:23:57 +02:00
Sebastian
15a59fd040
feat: Updated EntityExtractor to handle long texts and added better postprocessing (#3154)
* Remove dependence on HuggingFace TokenClassificationPipeline and group all postprocessing functions under one class

* Added copyright notice for HF and deepset to entity file to acknowledge that a lot of the postprocessing parts came from the transformers library.

* Fixed text squishing problem. Added additional unit test for it.

Co-authored-by: ju-gu <julian.gutsch@deepset.ai>
2022-10-17 21:26:44 +02:00
Unai Garay Maestre
3a2c8ae3c5
bug: Adds better way of checking query in BaseRetriever and Pipeline.run() (#3304)
* changes how query and queries are checked if they have been passed in BaseRetriever

* Fixes checking query properly in Pipeline run

* Fixes checking query properly in Pipeline run

* Adds test for FilterRetriever using run method when query is empty

* Adds mock filter retriever and adapts test

* Removes old test, adds MockRetriever to test file and test uses document_store

* Logs error when query is not of type string with a new test for run batch

* Update test/nodes/test_retriever.py

* schemas
2022-10-17 19:00:13 +02:00
Sara Zan
101d2bc86c
feat: MultiModalRetriever (#2891)
* Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly

* content_types

* Splitting classes into respective folders

* small changes

* Fix EOF

* eof

* black

* API

* EOF

* whitespace

* api

* improve multimodal similarity processor

* tokenizer -> feature extractor

* Making feature vectors come out of the feature extractor in the similarity head

* embed_queries is now self-sufficient

* couple trivial errors

* Implemented separate language model classes for multimodal inference

* Document embedding seems to work

* removing batch_encode_plus, is deprecated anyway

* Realized the base Data2Vec models are not trained on retrieval tasks

* Issue with the generated embeddings

* Add batching

* Try to fit CLIP in

* Stub of CLIP integration

* Retrieval goes through but returns noise only

* Still working on the scores

* Introduce temporary adapter for CLIP models

* Image retrieval now works with sentence-transformers

* Tidying up the code

* Refactoring is now functional

* Add MPNet to the supported sentence transformers models

* Remove unused classes

* pylint

* docs

* docs

* Remove the method renaming

* mpyp first pass

* docs

* tutorial

* schema

* mypy

* Move devices setup into get_model

* more mypy

* mypy

* pylint

* Move a few params in HaystackModel's init

* make feature extractor work with squadprocessor

* fix feature_extractor_kwargs forwarding

* Forgotten part of the fix

* Revert unrelated ES change

* Revert unrelated memdocstore changes

* comment

* Small corrections

* mypy and pylint

* mypy

* typo

* mypy

* Refactor the  call

* mypy

* Do not make FARMReader use the new FeatureExtractor

* mypy

* Detach DPR tests from FeatureExtractor too

* Detach processor tests too

* Add end2end marker

* extract end2end feature extractor tests

* temporary disable feature extraction tests

* Introduce end2end tests for tokenizer tests

* pylint

* Fix model loading from folder in FeatureExtractor

* working o n end2end

* end2end keeps failing

* Restructuring retriever tests

* Restructuring retriever tests

* remove covert_dataset_to_dataloader

* remove comment

* Better check sentence-transformers models

* Use embed_meta_fields properly

* rename passage into document

* Embedding dims can't be found

* Add check for models that support it

* pylint

* Split all retriever tests into suites, running mostly on InMemory only

* fix mypy

* fix tfidf test

* fix weaviate tests

* Parallelize on every docstore

* Fix schema and specify modality in base retriever suite

* tests

* Add first image tests

* remove comment

* Revert to simpler tests

* Update docs/_src/api/api/primitives.md

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/__init__.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* get_args

* mypy

* Update haystack/modeling/model/multimodal/__init__.py

* Update haystack/modeling/model/multimodal/base.py

* Update haystack/modeling/model/multimodal/base.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/sentence_transformers.py

* Update haystack/modeling/model/multimodal/sentence_transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/transformers.py

* Update haystack/modeling/model/multimodal/transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/modeling/model/multimodal/transformers.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update haystack/nodes/retriever/multimodal/retriever.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* mypy

* mypy

* removing more ContentTypes

* more contentypes

* pylint

* add to __init__

* revert end2end workflow for now

* missing integration markers

* Update haystack/nodes/retriever/multimodal/embedder.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* review feedback, removing HaystackImageTransformerModel

* review feedback part 2

* mypy & pylint

* mypy

* mypy

* fix multimodal docs also for Pinecone

* add note on internal constants

* Fix pinecone write_documents

* schemas

* keep support for sentence-transformers only

* fix pinecone test

* schemas

* fix pinecone again

* temporarily disable some tests, need to understand if they're still relevant

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-10-17 18:58:35 +02:00
Vladimir Blagojevic
159cd5a666
feat: Add OpenAIEmbeddingEncoder to EmbeddingRetriever (#3356) 2022-10-14 15:01:03 +02:00
Vladimir Blagojevic
5ebe3cb33d
fix: QuestionGenerator generates wrong document questions for non-default num_queries_per_doc parameter (#3381) 2022-10-14 12:08:30 +02:00
Stefano Fiorucci
7290196c32
fix: allow same vector_id in different indexes for SQL-based Document stores (#3383)
* fix_multiple_indexes

* improve test names
2022-10-14 09:55:56 +02:00
Massimiliano Pippi
31fa75e9fd
feat: add support for Elasticsearch 7.16.2 (#3318)
* bump elastic to 7.16.2+

* decouple Elasticsearch and Opensearch

use method override instead of func variables

fix mypy

default value

fix broken tests

update schema

* relax version pin

* rename the base class

* rename module

* fix import order

* do not run the new tests in the old job

* remove outdated TODO
2022-10-13 11:53:27 +02:00
Sebastian
75641dd024
fix: Added checks for DataParallel and WrappedDataParallel (#3366)
* Added checks for DataParallel and WrappedDataParallel

* Update isinstance checks according to pylint recommendation

* Using isinstance over types

* Added test for dpr training
2022-10-13 08:05:56 +02:00
Malte Pietsch
fb02b61e90
Update README.md (#3247) 2022-10-11 10:43:17 +02:00
tstadel
7fe5003c97
fix: eval() with add_isolated_node_eval=True breaks if no node supports it (#3347)
* fix isolated eval for pipelines without a node supporting isolated mode

* reformat

* add test
2022-10-10 20:48:13 +02:00
bogdankostic
84aff5e2b3
fix: Allow less restrictive values for parameters in Pipeline configurations (#3345)
* fix: Allow arbitrary values for parameters in Pipeline configurations

* Add test

* Adapt expected error message in tests

* Fix bug

* Fix bug on checking JSON

* Remove test cases that previously tested if error was thrown

* Change encoding in test

* Restrict possible values

* Re-add tests

* Re-add tests

* Add value flag to list elements
2022-10-10 13:08:45 +02:00
JacdDev
797c20c966
feat: Adding filters param to MostSimilarDocumentsPipeline run and run_batch (#3301)
* Adding filters param to MostSimilarDocumentsPipeline run and run_batch

* Adding index param to MostSimilarDocumentsPipeline run and run_batch

* Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch

* Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch

* Adding filters param to MostSimilarDocumentsPipeline run and run_batch

* Adding index param to MostSimilarDocumentsPipeline run and run_batch

* Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch

* Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch
2022-10-10 10:22:14 +02:00
tstadel
b84a6b1716
fix: opensearch script score with filters (#3321)
* fix opensearch script score filters

* add comment

* add integration test

* update schema
2022-10-06 15:41:29 +02:00
Vladimir Blagojevic
6cb4e93965
refactor: remove Inferencer multiprocessing (#3283) 2022-10-04 14:08:23 +02:00
Jeff Risberg
ad8fbe56ee
bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node (#3170)
* don't send the list of inputs back as an output in the running of a node.

* updated documentation

* Update pydoc-markdown.py

* added test case for pipeline join fix

Co-authored-by: JeffRisberg <jrisberg@aol.com>
2022-09-30 13:27:17 +02:00
Stefano Fiorucci
e2e6887ee8
Improve TransformersDocumentClassifier tests (#3270) 2022-09-27 13:25:34 +02:00
Vladimir Blagojevic
9582a423a2
fix: ONNX FARMReader model conversion is broken (#3211) 2022-09-26 09:18:12 -04:00
Vladimir Blagojevic
9ca3ccae98
fix:MostSimilarDocumentsPipeline doesn't have pipeline property (#3265)
* Add comments and a unit test

* More unit tests for MostSimilarDocumentsPipeline
2022-09-23 09:46:48 -04:00
tstadel
05a86b9d3d
feat: FAISS in OpenSearch: Support HNSW for cosine (#3217)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* split PR

* update docs

* manually revert tutorial doc change

* Fix embedding type

* set integration marker correctly

* make BaseDocumentStore.normalize_embedding static

* format

* fix handling of opensearch_faiss param

* fix merge

* add DenseRetriever typing

* organize imports in conftest.py

* organize imports in conftest.py (2)

* fix DenseRetriever import

* add opensearch-tests-linux
2022-09-23 13:26:49 +02:00
tstadel
4fa9d2d8e7
Fix milvus and faiss tests not running (#3263)
* fix milvus and faiss tests not running

* fix schema manually

* fix test_dpr_embedding test for milvus

* pip freeze on milvus tests

* fix milvus1 tests being executed: fix all_doc_stores order

* Revert "pip freeze on milvus tests"

This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.

* make infer_required_doc_store more robust

* don't skip tests without docstore requirements

* use markers for docstore tests
2022-09-22 17:46:49 +02:00
tstadel
b10e2c392e
chore: add DenseRetriever abstraction (#3252)
* support cosine similiarity with faiss

* update docs

* update api docs

* fix tests

* Revert "update api docs"

This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143.

* fix api docs

* collapse test

* rename similairity to space_type mappings

* only normalize for faiss

* fix merge

* fix docs normalization

* get rid of List[np.array]

* update docs

* fix tests and tutorials

* fix mypy

* fix mypy

* fix mypy again

* again mypy

* blacken

* update tutorial  4 docs

* fix embeddingretriever

* fix faiss

* move dense specific logic to DenseRetriever

* fix mypy

* cosine tests for all documents stores

* fix pinecone

* add docstring

* docstring corrections

* update docs

* add integration test marker

* docstrings update

* update docs

* fix typo

* update docs

* fix MockDenseRetriever

* run integration tests for all documentstores

* fix test_update_embeddings_cosine_similarity

* fix faiss tests not running

* blacken

* make test_cosine_sanity_check integration test

* update docs

* fix imports

* import  DenseRetriever normally

* update docs

* fix deepcopy of documents

* update schema

* Revert "update schema"

This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f.

* fix schema for ci manually
2022-09-21 19:08:54 +02:00
Vladimir Blagojevic
938e6fda5b
Classify pipeline's type based on its components (#3132)
* Add pipeline get_type mehod

* Add pipeline uptime

* Add pipeline telemetry event sending

* Send pipeline telemetry once a day (at most)

* Add pipeline invocation counter, change invocation counter logic

* Update allowed telemetry parameters - allow pipeline parameters

* PR review: add unit test
2022-09-21 14:53:42 +02:00
Stefano Fiorucci
89247b804c
refactor: make TransformersDocumentClassifier output consistent between different types of classification (#3224)
* make output consistent

* make output consistent

* added tests for details

* better tests

* Update test_document_classifier.py

* make black happy

* Update test_document_classifier.py

* Update test_document_classifier.py
2022-09-21 13:16:03 +02:00
Malte Pietsch
7e79a48540
bug: reactivate benchmarks with quick fixes (#2766)
* quick fix benchmark runs to make them work with current haystack version

* fix minor typo

* update readme. fix minor things to make benchmarks run again

* Update Documentation & Code Style

* fix typo in readme

* update result files for reader and retriever querying

* reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs)

* change default memory allocation back to normal. add note to readme

* add first indexing results

* add memory to docker cmd

* full benchmarks results on commit  c5a2651fcbbeffca06ffa9036b10e62669bcc1b0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-09-20 10:22:08 +02:00
Sara Zan
dcb132ba59
chore: remove f-strings from logs for performance reasons (#3212)
* Use the %s syntax on all debug messages

* Use the %s syntax on some more debug messages

* Use the %s syntax on info messages

* Use the %s syntax on warning messages

* Use the %s syntax on error and exception messages

* mypy

* pylint

* trogger tutorials execution in CI

* trigger tutorials execution on CI

* black

* remove embeddings from repr

* fix Document `__repr__`

* address feedback

* mypy
2022-09-19 18:18:32 +02:00
Massimiliano Pippi
8fbccbda82
fix: handle Documents containing dataframes in Multilabel constructor (#3237)
* format

* fix docs
2022-09-19 14:59:20 +02:00
Daniel Bichuetti
df1f4205b6
feat: add public layout-base extraction support on PDFToTextConverter (#3137)
* feat(PDFToTextConverter): add option to get text in physical layout order

* test: add physical layout extraction test to PDFToTextConverter

* refactor: change layout parameter attribution places

* docs: manually trigger pre-commits

* docs: generate new docs to comply with pydoc-markdown style
2022-09-13 16:55:21 +02:00
Kristof Herrmann
da1cc577ae
feat: exponential backoff with exp decreasing batch size for opensearch client (#3194)
* Validate custom_mapping properly as an object

* Remove related test

* black

* feat: exponential backoff with exp dec batch size

* added docstring and split doc lsit

* fix

* fix mypy

* fix

* catch generic exception

* added test

* mypy ignore

* fixed no attribute

* added test

* added tests

* revert strange merge conflicts

* revert merge conflict again

* Update haystack/document_stores/elasticsearch.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* done

* adjust test

* remove not required caplog

* fixed comments

Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-09-13 14:30:30 +01:00
Sara Zan
96bb9b5905
bug: validate custom_mapping as an object (#3189)
* Validate custom_mapping properly as an object

* Remove related test

* black
2022-09-09 18:03:29 +02:00
Daniel Bichuetti
621e1af74c
refactor: improve support for dataclasses (#3142)
* refactor: improve support for dataclasses

* refactor: refactor class init

* refactor: remove unused import

* refactor: testing 3.7 diffs

* refactor: checking meta where is Optional

* refactor: reverting some changes on 3.7

* refactor: remove unused imports

* build: manual pre-commit run

* doc: run doc pre-commit manually

* refactor: post initialization hack for 3.7-3.10 compat.

TODO: investigate another method to improve 3.7 compatibility.

* doc: force pre-commit

* refactor: refactored for both Python 3.7 and 3.9

* docs: manually run pre-commit hooks

* docs: run api docs manually

* docs: fix wrong comment

* refactor: change no type-checked test code

* docs: update primitives

* docs: api documentation

* docs: api documentation

* refactor: minor test refactoring

* refactor: remova unused enumeration on test

* refactor: remove unneeded dir in gitignore

* refactor: exclude all private fields and change meta def

* refactor: add pydantic comment

* refactor : fix for mypy on Python 3.7

* refactor: revert custom init

* docs: update docs to new pydoc-markdown style

* Update test/nodes/test_generator.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-09-09 11:31:37 +02:00
Daniel Bichuetti
e1f399284f
refactor: update dependencies and remove pins (#3147)
* refactor: remove azure-core, pydoc and hf-hub pins

* fix: remove extra-comma

* fix: force minimum version of azure forms recognizer

* refactor: allow newer ocr libs

* refactor: update more dependencies and container versions

* refactor: remove extra comment

* docs: pre-commit manual run

* refactor: remove unnecessary dependency

* tests: update weaviate container image version
2022-09-05 14:30:35 +02:00
Vladimir Blagojevic
66f3f42a46
fix: Replace multiprocessing tokenization with batched fast tokenization (#3089)
* Replace multiprocessing tokenization with batched fast tokenization

* Replace deprecated tokenization method invocations
2022-08-31 07:33:39 -04:00
Sara Zan
e88f1e2577
Add custom_mapping to the list of fields that can contain string-encoded JSON (#3065) 2022-08-29 11:10:24 +02:00
Julian Risch
3e3ff33cdd
feat: add batch evaluation method for pipelines (#2942)
* add basic pipeline.eval_batch for qa without filters

* black formatting

* pydoc-markdown

* remove batch eval tests failing due to bugs

* remove comment

* explain commented out tests

* avoid code duplication

* black

* mypy

* pydoc markdown

* add batch option to execute_eval_run

* pydoc markdown

* Apply documentation suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Apply documentation suggestion from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* add documentation based on review comments

* black

* black

* schema updates

* remove duplicate tests

* add separate method for column reordering

* merge _build_eval_dataframe methods

* pylint ignore in function

* change type annotation of queries to list only

* one-liner addressing review comment on params dict

* markdown files updated

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-25 17:50:57 +02:00
bogdankostic
e2ec0d1c15
feat: FAISS in OpenSearch: check existing index (#3101)
* Add check for mapping for existing indices

* Add test

* Check if "method" field exists
2022-08-25 17:33:26 +02:00
Sara Zan
e92ea4fccb
refactor: rename master into main in documentation and links (#3063)
* master->main

* revert master rename

* Revert change to sphinx link and rename master schema
2022-08-24 19:05:12 +02:00
tstadel
92046ce5b5
feat: FAISS in OpenSearch: Support HNSW for dot product and l2 (#3029)
* support faiss hnsw

* blacken

* update docs

* improve similarity check

* add tests

* update schema

* set ef_search param correctly

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* regenerate docs

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 16:43:48 +02:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db (#2749)
* update to PineconeDocumentStore to remove dependency on SQL db

* Update Documentation & Code Style

* typing fixes

* Update Documentation & Code Style

* fixed embedding generator to yield Documents

* Update Documentation & Code Style

* fixes for final typing issues

* fixes for pylint

* Update Documentation & Code Style

* uncomment pinecone tests

* added new params to docstrings

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* changes based on comments, updated errors and install

* Update Documentation & Code Style

* mypy

* implement simple filtering in pinecone mock

* typo

* typo in reverse

* account for missing meta key in filtering

* typo

* added metadata filtering to describe index

* added handling for users switching indexes in same doc store, and handling duplicate docs in write

* syntax tweaks

* added index option to document/embedding count calls

* labels implementation in progress

* added metadata fields to be indexed for pinecone tests

* further changes to mock

* WIP implementation of labels+multilabels

* switched to rely on labels namespace rather than filter

* simpler delete_labels

* label fixes, remove debug code

* Apply dostring fixes

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* mypy

* pylint

* docs

* temporarily un-mock Pinecone

* Small Pinecone test suite

* pylint

* Add fake test key to pass the None check

* Add again fake test key to pass the None check

* Add Pinecone to default docstores and fix filters

* Fix field name

* Change field name

* Change field value

* Remove comments

* forgot to upgrade pyproject.toml

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00
Stefano Fiorucci
891707ecaa
bug: handle Optional params in schema validation (#2980)
* not working draft

* first draft

* fix

* revert json schema

* better schema

* improvements, support different python versions

* little simplification

* improvements and more tests

* Revert "Merge branch 'handle_optional_params' into origin/main"

This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34.

* fix git mess

* handle optional params; schema

* test null values

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-08-24 10:40:19 +02:00
bogdankostic
b03de53716
Use random_sample instead of ndarray for random array (#3083) 2022-08-22 13:19:45 +02:00
Massimiliano Pippi
97a8d30512
feat: Allow exact list matching with field in Elasticsearch filtering (#2988)
* ES filtering - allow exact list matching with field

typing fix

Update Documentation & Code Style

remove default hit limit in filtering queries

Update Documentation & Code Style

pytest es list eq filter

Update Documentation & Code Style

* review feedback

* fixed test

Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>
2022-08-22 12:42:37 +02:00