Tobias Wochinger
fe0ac5c4a2
chore: enforce kwarg logging ( #7207 )
...
* chore: add logger which eases logging of extras
* chore: start migrating to key value
* fix: import fixes
* tests: temporarily comment out breaking test
* refactor: move to kwarg based logging
* style: fix import order
* chore: implement self-review comments
* test: drop failing test
* chore: fix more import orders
* docs: add changelog
* tests: fix broken tests
* chore: fix getting the frames
* chore: add comment
* chore: cleanup
* chore: adapt remaining `%s` usages
2024-02-29 14:31:20 +01:00
David S. Batista
3fc77979d8
fixing docstrings ( #7225 )
2024-02-27 17:50:36 +01:00
ZanSara
1182c08daf
fix: Dont filter negative scores when using BM25Okapi and scale_score=False ( #6889 )
...
* dont filter negatives for unscaled Okapi
* change BM25 algorithm default to BM25L
* Update haystack/document_stores/in_memory/document_store.py
* improve comment
2024-02-06 11:07:27 +01:00
Madeesh Kannan
a5189dd035
fix!: InMemoryBM25Retriever no longer returns documents that have a score of 0.0 ( #6717 )
...
* fix!: `InMemoryBM25Retriever` no longer returns documents that have a score of 0.0
Also update tests to accommodate the new behavior.
* Remove superfluous code
2024-01-12 17:50:55 +01:00
Massimiliano Pippi
e1ec4e5e4d
refact!: Remove symbols under the haystack.document_stores namespace ( #6714 )
...
* remove symbols under the haystack.document_stores namespace
* Update haystack/document_stores/types/protocol.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* fix
* same for retrievers
* leftovers
* more leftovers
* add relnote
* leftovers
* one more
* fix examples
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-01-10 21:20:42 +01:00
Massimiliano Pippi
00fed32024
build: depend on haystack_bm25 instead of rank_bm25 ( #6578 )
...
* use the forked package
* switch package dependency
* relnote
* fix package name
2023-12-18 10:47:15 +01:00
Stefano Fiorucci
4912f7cb58
refactor!: improve the deserialization logic for components that use a Document Store ( #6466 )
...
* improve deserialization
* rm ds decorator
* improve tests
* fix pylint
* rm decorator from module init
* rm decorator
* rm decorator from factory
* fix tests
* release note
* rm print
2023-12-04 15:17:28 +01:00
Silvano Cerza
5240672088
Rename document_store/protocols.py to document_store/protocol.py ( #6448 )
2023-11-29 16:41:15 +01:00
Silvano Cerza
831d0611d9
feat: Change default DuplicatePolicy in DocumentStore.write_documents() ( #6438 )
...
* Change default DuplicatePolicy in DocumentStore.write_documents()
* Add release notes
2023-11-28 12:30:17 +01:00
Silvano Cerza
9a7fd6f2ce
refactor: Add new filters tests for Document Store testing ( #6428 )
...
* Add new filters tests for Document Store testing
* Add release notes
2023-11-28 09:57:08 +01:00
Silvano Cerza
e6637f5ec2
Fix all tests
2023-11-24 14:48:43 +01:00
Massimiliano Pippi
f71e11c717
Removed preview package
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:49:41 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
pandasar13
edb40b6c1b
refactor: add batch_size to FAISS __init__ ( #6401 )
...
* refactor: add batch_size to FAISS __init__
* refactor: add batch_size to FAISS __init__
* add release note to refactor: add batch_size to FAISS __init__
* fix release note
* add batch_size to docstrings
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-11-23 17:27:24 +01:00
x110
c4cfe6cb90
fix: Load additional fields from SQUAD-format file to meta field for labels #5978 ( #6301 )
...
* Load additional fields from SQUAD-format file to meta field for labels
* added a test function
* rewritten test using pytest
* added release notes
* improve release note
* clean up test
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-16 10:44:51 +01:00
Ivana Zeljkovic
2326f2f9fe
feat: Pinecone document store optimizations ( #5902 )
...
* Optimize methods for deleting documents and getting vector count. Enable warning messages when Pinecone limits are exceeded on Starter index type.
* Fix typo
* Add release note
* Fix mypy errors
* Remove unused import. Fix warning logging message.
* Update release note with description about limits for Starter index type in Pinecone
* Improve code base by:
- Adding new test cases for get_embedding_count method
- Fixing get_embedding_count method
- Improving delete documents
- Fix label retrieval
- Increase default batch size
- Improve get_document_count method
* Remove unused variable
* Fix mypy issues
2023-10-16 19:26:24 +02:00
Christian Clauss
bf6d306d68
ci: Simplify Python code with ruff rules SIM ( #5833 )
...
* ci: Simplify Python code with ruff rules SIM
* Revert #5828
* ruff --select=I --fix haystack/modeling/infer.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-20 08:32:44 +02:00
Christian Clauss
91ab90a256
perf: Python performance improvements with ruff C4 and PERF fixes ( #5803 )
...
* Python performance improvements with ruff C4 and PERF
* pre-commit fixes
* Revert changes to examples/basic_qa_pipeline.py
* Revert changes to haystack/preview/testing/document_store.py
* revert releasenotes
* Upgrade to ruff v0.0.290
2023-09-16 16:26:07 +02:00
Onur Eren Arpacı
8af0d816e6
bug: fix the date_fields request bottleneck ( #5695 )
...
* bug: fix the date_fields request bottleneck
I encountered a performance issue while attempting to index 1 million vectors. Despite the Weaviate instance having low utilization, the process was estimated to take around 10 hours.
After some investigation, I identified the bottleneck: _get_date_properties function was being called for every document, consequently a request to the Weaviate client was being sent and awaited for each document.
To address this, I optimized the code by invoking the _get_date_properties function only when there is a schema change. This modification resulted in a notable performance improvement, reducing the indexing time to approximately 90 minutes for the same 1 million vectors.
* bug: fix the date_fields request bottleneck
* fix: executed the pre commit hooks for #9341
2023-09-15 18:12:14 +02:00
ZanSara
9056c43240
fix: remove __future__ import from pinecone.py ( #5813 )
...
* remove future import
* fix forward reference
2023-09-14 16:28:39 +02:00
Ivana Zeljkovic
4bad202197
feat: Pinecone document store refactoring ( #5725 )
...
* Refactor codebase so that doc_type metadata is used instead of namespaces for making distinction between documents without embeddings, documents with embeddings and labels
* Fix parameter name in integration test
* Remove code under comment in add_type_metadata_filter method
* Fix mypy and pylint checks
* Add release note
* Apply minimal changes: rename method, update method docs and remove redundant method
* Mypy fixes
* Fix docstrings
* Revert helper methods for fetching documents when the number of documents exceeds Pinecone limit
* Remove unnecessary attributes in PineconeDocumentStore
* Fix unit test
---------
Co-authored-by: Ivana Zeljkovic <ivana.zeljkovic@smartcat.io>
Co-authored-by: DosticJelena <jelena.dostic@smartcat.io>
2023-09-14 11:46:47 +02:00
Christian Clauss
6dd52d91b2
ci: Fix typos discovered by codespell ( #5778 )
...
* Fix typos discovered by codespell
* pylint: max-args = 38
2023-09-13 16:14:45 +02:00
Tuana Çelik
4bb22c9665
Update weaviate.py ( #5469 )
...
Updating the weaviate docstrings to replace the old URL with the new correct one. The old one now gives a 404
2023-08-10 15:37:55 +02:00
ZanSara
c27622e1bc
chore: normalize more optional imports ( #5251 )
...
* docstore filters
* modeling metrics
* doc language classifier
* file converter
* docx converter
* tika
* preprocessor
* context matcher
* pylint
2023-08-09 09:27:53 +02:00
tstadel
d46c84bb61
feat: support dynamic filters in custom_query ( #5427 )
...
* support filters in custom_query
* better tests
* Update docstrings
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-08 15:48:15 +02:00
tstadel
d26d4201fc
feat: support search_fields in DeepsetCloudDocumentStore ( #5455 )
...
* feat: support search_fields in DeepsetCloudDocumentStore
* add reno file
* make search_fields plain init arg
* Update lg
* Update releasenotes/notes/deepset-cloud-document-store-search-fields-40b2322466f808a3.yaml
* Update haystack/document_stores/deepsetcloud.py
---------
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-04 11:13:05 +02:00
bogdankostic
237d67dbfd
feat: Check version of Elasticsearch server and add support for Elasticsearch <= 7.5 ( #5320 )
...
* Check ES server version + add support for ES <= 7.5
* Adapt comment
* PR feedback
2023-07-13 14:50:43 +02:00
bogdankostic
206b21816c
chore: Adapt import message for Elasticsearch7 ( #5295 )
...
* Adapt import message for es7.ElasticsearchDocumentStore
* Move import statement
2023-07-10 10:21:26 +02:00
tstadel
9acb275680
fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests ( #5113 )
...
* use _source on opensearch bulk requests
* fix label bulk requests
* add tests
* fix test
* apply feedback
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-07-07 15:12:50 +02:00
Massimiliano Pippi
00efa514ca
refactor: remove Elasticsearch client version 8 deprecation warnings ( #5245 )
...
* remove deprecation warnings
* remove leftover
2023-07-04 14:17:34 +02:00
Vladimir Blagojevic
1066e959a2
bug: fix for pinecone not working for per document updates ( #5110 )
2023-07-03 14:07:52 +02:00
Stefano Fiorucci
1be39367ac
Fix: FAISSDocumentStore - make write_documents properly work in combination w update_embeddings ( #5221 )
...
* Update VERSION.txt
* first draft
* simplify method and test
* rm unnecessary pb.close
* integrate feedback
2023-07-03 10:07:36 +02:00
Massimiliano Pippi
6c1d0fbf04
refactor: isolate Elasticsearch client calls ( #5241 )
...
* isolate client code
* pass headers
* pass headers
* more adjustments
* revert
* revert
* leftover
* fix opensearch
2023-06-30 18:29:01 +02:00
Massimiliano Pippi
cb638af0ff
refactor: fix method type and add comments ( #5235 )
...
* fix method type and add comments
* fix tests
2023-06-30 11:55:52 +02:00
Massimiliano Pippi
037e4f24ce
refactor: add a new Document Store supporting Elasticsearch 8 ( #5231 )
...
* introduce es8
* prepare tests
* fix unit tests
* adjust tests
* install elastic_transport package
* make mypy happy
* fix opensearch tests
2023-06-29 16:40:10 +02:00
Massimiliano Pippi
d5c13aa71d
refactor: introduce a base class for ElasticsearchDocumentStore ( #5228 )
...
* introduce a base class
* forgot
* fix linting
* try
* try
* schema generation doesnt support aliasing, use the same name
2023-06-29 13:28:49 +02:00
bogdankostic
8c63e295f4
fix: Allow filtering on list fields in InMemoryDocumentStore with all operators ( #5208 )
...
* Add support for list fields
* Unskip tests
2023-06-29 12:10:39 +02:00
Massimiliano Pippi
6373e2ea66
refactor: prepare support to Elasticsearch 8 ( #5226 )
...
* make a package
* Update haystack/document_stores/elasticsearch/es7.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* do not expose ES types from the package
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-06-29 11:06:20 +02:00
ZanSara
462f3a5c99
feat: globally disable progress bars ( #5207 )
...
* add SilenceableTqdm and update usage
* pylint
* rename module
* add tests
2023-06-27 11:45:17 +02:00
bogdankostic
82291b56ad
fix: Send batches of query-doc pairs to inference_from_objects ( #5125 )
...
* Send batches of query-doc pairs to inference_from_objects
* Use absolute import path
* Add separate preprocessing_batch_size parameter
2023-06-26 14:26:26 +02:00
ZanSara
7a9cf30063
chore: remove safe_import and all usages ( #5139 )
...
* remove safe_import and all usages
* forward references
* fix additional import
* mypy
* mypy
* pylint
* forward reference
* Update haystack/document_stores/opensearch.py
Co-authored-by: bogdankostic <bogdankostic@web.de>
* fix except clause
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-06-26 12:42:43 +02:00
Julian Risch
30fdf2b5df
feat!: Add extra for inference dependencies such as torch ( #5147 )
...
* feat!: add extra for inference dependencies such as torch
* add inference extra to 'all' and 'all-gpu' extra
* install inference extra in selected integration tests
* import LazyImport
* review feedback
* add import error messages and update readme
* remove extra dot
2023-06-20 09:54:10 +02:00
Shukri
916e8452f5
feat!: simplify weaviate auth ( #5115 )
...
* feat!: simplify weaviate auth
* docs: explain param precedence
* refactor: simplify _get_embedded_options
2023-06-19 15:46:58 +02:00
Ben Heckmann
60e5d73424
fix: changing document scores ( #5090 )
...
* #4653 fix changing scores by returning new document objects from document store queries
* added integration test for InMemoryDocumentStore demonstrating the desired behavior
* Update test/document_stores/test_memory.py
2023-06-14 17:35:46 +02:00
ZanSara
20c1f23fff
feat: optional transformers ( #5101 )
...
* generalimport -> lazy-imports
* remove generalimport
* fix pdftotextconverter import check
* customize error messages
* pylint
* fix sql.py
* pylint
* Update haystack/document_stores/sql.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* make contextmanager less verbose
* do not catch syntax errors
* review feedback
* make all torch and transformers import lazy
* fix environment.py
* mypy
* merge leftovers
* fix schema
* pylint
* review feedback
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-14 12:00:20 +02:00
ZanSara
52e7a77595
feat: introduce lazy_import ( #5084 )
...
* generalimport -> lazy-imports
* remove generalimport
* fix pdftotextconverter import check
* customize error messages
* pylint
* fix sql.py
* pylint
* Update haystack/document_stores/sql.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* make contextmanager less verbose
* do not catch syntax errors
* review feedback
* Update haystack/nodes/file_converter/pdf.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-06-08 12:11:38 +02:00
bogdankostic
da1f245a84
feat: Add batch_size parameter and cast timeout_config value to tuple for WeaviateDocumentStore ( #5079 )
...
* Add batch_size parameter and cast timeout_config to tuple
* Add unit test
* Remove debug tqdm
* Remove debug tqdm introduced in #5063
2023-06-06 17:06:10 +02:00
bogdankostic
9cb83402c4
refactor: Use globally defined request timeout in ElasticsearchDocumentStore and OpenSearchDocumentStore ( #5064 )
...
* Include benchmark config in output
* Use queries from aggregated labels
* Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore
* Use globally defined timeout
* Fix mypy
* Use self.batch_size in write_documents
* Use 10_000 as default batch size
* Add unit tests for write documents
2023-06-05 09:47:31 +02:00
bogdankostic
a9a49e2c0a
feat: Add batching for querying in ElasticsearchDocumentStore and OpenSearchDocumentStore ( #5063 )
...
* Include benchmark config in output
* Use queries from aggregated labels
* Introduce batching for querying in ElasticsearchDocStore and OpenSearchDocStore
* Fix mypy
* Use self.batch_size in write_documents
* Use 10_000 as default batch size
* Add unit tests for write documents
2023-06-01 18:47:24 +02:00
Silvano Cerza
ba06bc4805
Unpin typing_extensions and remove all its uses ( #5040 )
2023-05-29 15:31:34 +02:00