18 Commits

Author SHA1 Message Date
Ivana Zeljkovic
2326f2f9fe
feat: Pinecone document store optimizations (#5902)
* Optimize methods for deleting documents and getting vector count. Enable warning messages when Pinecone limits are exceeded on Starter index type.

* Fix typo

* Add release note

* Fix mypy errors

* Remove unused import. Fix warning logging message.

* Update release note with description about limits for Starter index type in Pinecone

* Improve code base by:
- Adding new test cases for get_embedding_count method
- Fixing get_embedding_count method
- Improving delete documents
- Fix label retrieval
- Increase default batch size
- Improve get_document_count method

* Remove unused variable

* Fix mypy issues
2023-10-16 19:26:24 +02:00
Christian Clauss
bf6d306d68
ci: Simplify Python code with ruff rules SIM (#5833)
* ci: Simplify Python code with ruff rules SIM

* Revert #5828

* ruff --select=I --fix haystack/modeling/infer.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-20 08:32:44 +02:00
Christian Clauss
1bc03ddc73
ci: Fix all ruff pyflakes errors except unused imports (#5820)
* ci: Fix all ruff pyflakes errors except unused imports

* Delete releasenotes/notes/fix-some-pyflakes-errors-69a1106efa5d0203.yaml
2023-09-15 18:30:33 +02:00
Christian Clauss
9405eb90ee
ci: Fix invalid escape sequences in Python code (#5802)
* ci: Use ruff in pre-commit to further limit complexity

* Fix invalid escape sequences in Python code

* Delete releasenotes/notes/ruff-4d2504d362035166.yaml
2023-09-14 16:42:48 +02:00
Ivana Zeljkovic
4bad202197
feat: Pinecone document store refactoring (#5725)
* Refactor codebase so that doc_type metadata is used instead of namespaces for making distinction between documents without embeddings, documents with embeddings and labels

* Fix parameter name in integration test

* Remove code under comment in add_type_metadata_filter method

* Fix mypy and pylint checks

* Add release note

* Apply minimal changes: rename method, update method docs and remove redundant method

* Mypy fixes

* Fix docstrings

* Revert helper methods for fetching documents when the number of documents exceeds Pinecone limit

* Remove unnecessary attributes in PineconeDocumentStore

* Fix unit test

---------

Co-authored-by: Ivana Zeljkovic <ivana.zeljkovic@smartcat.io>
Co-authored-by: DosticJelena <jelena.dostic@smartcat.io>
2023-09-14 11:46:47 +02:00
Vladimir Blagojevic
1066e959a2
bug: fix for pinecone not working for per document updates (#5110) 2023-07-03 14:07:52 +02:00
bogdankostic
43509c88bf
fix: Add support for _split_overlap meta to Pinecone and dict metadata in general to Weaviate (#4805)
* Add support for dicts to Weaviate

* Add support for _split_overlap to Pinecone

* Add tests

* Fix Pylint

* Fix Pylint

* Fix test

* Implement PR feedback
2023-05-05 11:20:21 +02:00
ZanSara
1b57b96210
refactor!: extract elasticsearch (#4668)
* extract elasticsearch

* update pyproject.toml

* make more import optional

* move MockBaseRetriever in conftest

* install es in the es integration tests
2023-04-26 10:14:20 +02:00
Massimiliano Pippi
83d615a32b
feat: include testing facilities into haystack package (#4182) 2023-02-17 19:38:03 +01:00
Silvano Cerza
274746db07
style: Update black (#4101)
* Update black version

* Format file with new black style

* Update black pre-commit hook version
2023-02-08 15:34:43 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) (#4062)
* initial version without shapers

* set document_ids for BaseGenerator

* introduce question-answering-with-references template

* better prompt

* make PromptTemplate control output_variable

* update schema

* fix add_doc_meta_data_to_answer

* Revert "fix add_doc_meta_data_to_answer"

This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.

* fix add_doc_meta_data_to_answer

* fix eval

* fix pylint

* fix pinecone

* fix other tests

* fix test

* fix flaky test

* Revert "fix flaky test"

This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.

* adjust docstrings

* make Label loading backward-compatible

* fix Label backward compatibility for pinecone

* fix Label backward compatibility for search engines

* fix Label backward compatibility for deepset Cloud

* fix tests

* fix None issue

* fix test_write_feedback

* add tests for legacy label support

* add document_id test for pinecone

* reduce unnecessary contents

* add comment to pinecone test
2023-02-08 08:37:22 +01:00
Sebastian
71de0524de
fix: fixed InMemoryDocumentStore.get_embedding_count to return correct number (#3980)
* Fix the embedding count function of InMemoryDocumentStore

* Adding some doc strings explaining how many docs with embeddings to expect.
2023-01-30 12:38:30 +01:00
Ahmed Nabil
12e057837b
Adding condition to pinecone object. (#3768)
* Adding condition to `pinecone` object.

While you can assign any values to `PineconeDocumentStore`'s parameter `pinecone_index`, it must have another condition to prevent that from happening.

* Added test, and changed the code to make sure the pinecone idx variable has correct instance

* fixed black error

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-19 01:34:44 +05:30
Julian Risch
0c2d13f1b8
bug: skip validating empty embeddings (#3774)
* skip validating empty embeddings

* skip batches without embeddings to update

* add unit test with mocked retriever
2023-01-05 15:13:57 +01:00
James Briggs
520b23ec1b
fix: pinecone metadata format (#3660)
* fix for multilevel metadata dictionaries

* add metadata dict formating to update function

* typing

* added check for labels meta

* added more info to input parameters

* added test for multilayer metadata

* removed todo
2022-12-13 10:11:24 +01:00
Massimiliano Pippi
b20f808119
refactor: move more tests to the base class (#3637)
* move more tests to the base class

* skip tests where unsupported

* do not pass index label explicitly

* skip test for Pinecone
2022-11-29 08:43:27 +01:00
Massimiliano Pippi
057a8c0b4f
refactor: Pinecone tests (#3555)
* add pytest option to unmock pinecone

* first try

* handle missing answer

* fix labels metadata

* more tests

* adapt workflow

* typo

* address review comments
2022-11-14 15:19:15 +01:00
James Briggs
9b1b03002f
update to PineconeDocumentStore to remove dependency on SQL db (#2749)
* update to PineconeDocumentStore to remove dependency on SQL db

* Update Documentation & Code Style

* typing fixes

* Update Documentation & Code Style

* fixed embedding generator to yield Documents

* Update Documentation & Code Style

* fixes for final typing issues

* fixes for pylint

* Update Documentation & Code Style

* uncomment pinecone tests

* added new params to docstrings

* Update Documentation & Code Style

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update Documentation & Code Style

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* Update haystack/document_stores/pinecone.py

Co-authored-by: Sara Zan <sarazanzo94@gmail.com>

* changes based on comments, updated errors and install

* Update Documentation & Code Style

* mypy

* implement simple filtering in pinecone mock

* typo

* typo in reverse

* account for missing meta key in filtering

* typo

* added metadata filtering to describe index

* added handling for users switching indexes in same doc store, and handling duplicate docs in write

* syntax tweaks

* added index option to document/embedding count calls

* labels implementation in progress

* added metadata fields to be indexed for pinecone tests

* further changes to mock

* WIP implementation of labels+multilabels

* switched to rely on labels namespace rather than filter

* simpler delete_labels

* label fixes, remove debug code

* Apply dostring fixes

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* mypy

* pylint

* docs

* temporarily un-mock Pinecone

* Small Pinecone test suite

* pylint

* Add fake test key to pass the None check

* Add again fake test key to pass the None check

* Add Pinecone to default docstores and fix filters

* Fix field name

* Change field name

* Change field value

* Remove comments

* forgot to upgrade pyproject.toml

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-24 13:27:15 +02:00