Vladimir Blagojevic
92a6221927
feat: Add PyPDFToDocument component (2.0) ( #5850 )
...
* Initial PyPDFToDocument implementation
* Remove progress bar
* Add release note
* Minor fix
* import check and dependency
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-21 11:52:26 +02:00
ZanSara
23fdef929e
chore: move GPT35Generator
tests in the main test suite ( #5844 )
...
* move tests
* fix no-test-found error from pytest
* missing self
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-21 11:42:32 +02:00
ZanSara
28f5c4c780
fix: Whisper integration tests ( #5851 )
...
* fix tests
* add ffmpeg
* apt update for ffmpeg
* not run on windows
2023-09-21 00:14:07 +02:00
bogdankostic
abe2706298
feat: Add MetadataRouter
(2.0) ( #5824 )
...
* Move filter utilities
* Add MetadataRouter
* Add tests for MetadataRouter
* Add more tests
* Rename FileExtensionClassifer to FileExtensionRouter
* Add support for dates in filters
* Add tests
* Add release note
* Add release note
* Apply suggestions from code review
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-20 14:49:17 +02:00
ZanSara
c933bcaa69
chore: move Whisper e2e tests in the main tests suite ( #5845 )
...
* move whisper local tests
* remove e2e file
* move remote tests
* remove e2e file
2023-09-20 14:48:09 +02:00
ZanSara
454988672e
feat: UrlCacheChecker
( #5841 )
...
* add UrlCacheChecker
* rename
* add tests
* reno
* pylint
* review feedback
2023-09-20 14:45:50 +02:00
bogdankostic
719c1c040c
feat: Add support for dates in filters (2.0) ( #5823 )
...
* Add support for dates in filters
* Add tests
* Add release note
* Update haystack/preview/utils/filters.py
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-20 12:05:56 +02:00
ZanSara
44f0c468ac
move websearch tests back to main tests suite ( #5842 )
2023-09-20 11:55:18 +02:00
Vladimir Blagojevic
0983fb656a
feat: Add LinkContentFetcher
Haystack 2.0 component ( #5724 )
...
* Add LinkContentFetcher
* Add release note
* Small fixes
* Fix pydocs
* PR feedback
* Remove handlers registration
* PR feedback
* adjustments
* improve tests
* initial draft
* tests
* add proposal
* proposal number
* reno
* fix tests and usage of content and content_type
* update branch & fix more tests
* mypy
* use the new document
* add docstring
* fix more tests
* mypy
* fix tests
* add e2e
* review feedback
* improve __str__
* Apply suggestions from code review
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/preview/dataclasses/document.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* improve __str__
* fix tests
* fix more tests
* fix test
* Fix end-of-file-fixer
* Post merge fixes
* Move e2e tests back into component
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-20 11:03:52 +02:00
Christian Clauss
bf6d306d68
ci: Simplify Python code with ruff rules SIM ( #5833 )
...
* ci: Simplify Python code with ruff rules SIM
* Revert #5828
* ruff --select=I --fix haystack/modeling/infer.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-20 08:32:44 +02:00
Stefano Fiorucci
de84a95970
separate classes and tests ( #5819 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-09-19 19:21:49 +02:00
Malte Pietsch
aa3cc3d5ae
feat: Add support for OpenAI's gpt-3.5-turbo-instruct
model ( #5837 )
...
* support gpt-3.5.-turbo-instruct
* add release note
2023-09-19 16:06:43 +02:00
Christian Clauss
91ab90a256
perf: Python performance improvements with ruff C4 and PERF fixes ( #5803 )
...
* Python performance improvements with ruff C4 and PERF
* pre-commit fixes
* Revert changes to examples/basic_qa_pipeline.py
* Revert changes to haystack/preview/testing/document_store.py
* revert releasenotes
* Upgrade to ruff v0.0.290
2023-09-16 16:26:07 +02:00
Christian Clauss
1bc03ddc73
ci: Fix all ruff pyflakes errors except unused imports ( #5820 )
...
* ci: Fix all ruff pyflakes errors except unused imports
* Delete releasenotes/notes/fix-some-pyflakes-errors-69a1106efa5d0203.yaml
2023-09-15 18:30:33 +02:00
Silvano Cerza
5c04cd6ba2
Fix Document constructor accepting unused id parameter ( #5826 )
2023-09-15 17:03:03 +02:00
Christian Clauss
9405eb90ee
ci: Fix invalid escape sequences in Python code ( #5802 )
...
* ci: Use ruff in pre-commit to further limit complexity
* Fix invalid escape sequences in Python code
* Delete releasenotes/notes/ruff-4d2504d362035166.yaml
2023-09-14 16:42:48 +02:00
Stefano Fiorucci
1c69070db6
make MemoryEmbeddingRetriever act in non-batch mode ( #5809 )
2023-09-14 15:37:20 +02:00
Stefano Fiorucci
ad5b615503
make SentenceTransformersTextEmbedder non batch ( #5811 )
2023-09-14 12:38:24 +02:00
Ivana Zeljkovic
4bad202197
feat: Pinecone document store refactoring ( #5725 )
...
* Refactor codebase so that doc_type metadata is used instead of namespaces for making distinction between documents without embeddings, documents with embeddings and labels
* Fix parameter name in integration test
* Remove code under comment in add_type_metadata_filter method
* Fix mypy and pylint checks
* Add release note
* Apply minimal changes: rename method, update method docs and remove redundant method
* Mypy fixes
* Fix docstrings
* Revert helper methods for fetching documents when the number of documents exceeds Pinecone limit
* Remove unnecessary attributes in PineconeDocumentStore
* Fix unit test
---------
Co-authored-by: Ivana Zeljkovic <ivana.zeljkovic@smartcat.io>
Co-authored-by: DosticJelena <jelena.dostic@smartcat.io>
2023-09-14 11:46:47 +02:00
Christian Clauss
6dd52d91b2
ci: Fix typos discovered by codespell ( #5778 )
...
* Fix typos discovered by codespell
* pylint: max-args = 38
2023-09-13 16:14:45 +02:00
Christian Clauss
30ca042370
ci: Use ruff in pre-commit to further limit code complexity ( #5783 )
...
* ci: Use ruff in pre-commit to further limit complexity
* Delete releasenotes/notes/ruff-4d2504d362035166.yaml
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-13 15:18:16 +02:00
ZanSara
5888fb7052
make MemoryBM25Retriever non match ( #5768 )
2023-09-13 15:11:47 +02:00
Julian Risch
4ae0924ea0
feat!: Remove SklearnQueryClassifier ( #5779 )
...
* remove SklearnQueryClassifier
* reno
2023-09-13 12:55:33 +02:00
Stefano Fiorucci
283ecf2760
feat: add prefix
and suffix
to SentenceTransformersDocumentEmbedder
( #5745 )
...
* add prefix and suffix
* fix test
2023-09-13 12:55:06 +02:00
ZanSara
335a09bc1d
feat: make AnswerBuilder
non batch ( #5766 )
...
* make answerbuilder non batch
* fix mypy
* review feedback
* mypy
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-09-13 12:01:16 +02:00
ZanSara
2c4d839b64
feat: GPT4Generator
( #5744 )
...
* add gpt4generator
* add e2e
* add tests
* reno
* fix e2e
* Update test/preview/components/generators/openai/test_gpt4_generator.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-09-13 10:07:09 +02:00
ZanSara
94c5d6d216
feat: make GPT35Generator
non batch ( #5764 )
...
* make gpt35generator not batch
* fix tests
* review feedback
* mypy
2023-09-12 18:19:28 +02:00
ZanSara
6e70d403f8
feat: Improve Document
for Haystack 2.0 ( #5738 )
...
* initial draft
* tests
* add proposal
* proposal number
* reno
* fix tests and usage of content and content_type
* update branch & fix more tests
* mypy
* add docstring
* fix more tests
* review feedback
* improve __str__
* Apply suggestions from code review
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Update haystack/preview/dataclasses/document.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* improve __str__
* fix tests
* fix more tests
* Update haystack/preview/document_stores/memory/document_store.py
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-11 17:40:00 +02:00
Stefano Fiorucci
2edf85f739
MemoryEmbeddingRetriever
(2.0) (#5726 )
...
* MemoryDocumentStore - Embedding retrieval draft
* add release notes
* fix mypy
* better comment
* improve return_embeddings handling
* MemoryEmbeddingRetriever - first draft
* address PR comments
* release note
* update docstrings
* update docstrings
* incorporated feeback
* add return_embedding to __init__
* rm leftover docstring
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-08 15:52:48 +02:00
Stefano Fiorucci
d860a5c604
make tests more robust ( #5747 )
2023-09-08 15:50:56 +02:00
Stefano Fiorucci
b7bea3ae9c
MemoryDocumentStore
- Embedding retrieval (2.0) (#5715 )
...
* MemoryDocumentStore - Embedding retrieval draft
* add release notes
* fix mypy
* better comment
* improve return_embeddings handling
* address PR comments
* update docstrings
* incorporated feeback
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-07 15:44:07 +02:00
bogdankostic
71852c7b06
Fix output of AnswerBuilder ( #5737 )
2023-09-07 12:54:24 +02:00
ZanSara
63cbde7287
feat: GPT35Generator
( #5714 )
...
* chatgpt backend
* fix tests
* reno
* remove print
* helpers tests
* add chatgpt generator
* use openai sdk
* remove backend
* tests are broken
* fix tests
* stray param
* move _check_troncated_answers into the class
* wrong import
* rename function
* typo in test
* add openai deps
* mypy
* improve system prompt docstring
* typos update
* Update haystack/preview/components/generators/openai/chatgpt.py
* pylint
* Update haystack/preview/components/generators/openai/chatgpt.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/components/generators/openai/chatgpt.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/components/generators/openai/chatgpt.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* review feedback
* fix tests
* freview feedback
* reno
* remove tenacity mock
* gpt35generator
* fix naming
* remove stray references to chatgpt
* fix e2e
* Update releasenotes/notes/chatgpt-llm-generator-d043532654efe684.yaml
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* add another test
* test wrong model name
* review feedback
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-09-07 10:06:57 +02:00
Vladimir Blagojevic
c5edb45c10
feat: Add SerperDevWebSearch
Haystack 2.0 component ( #5712 )
...
* Add SerperDev
* Add release note
* PR Feedback
* Simplify, remove one-liner
* Update haystack/preview/components/websearch/serper_dev.py
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
* Update haystack/preview/components/websearch/serper_dev.py
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
* Fix formatting
* PR feedback
* Fix tests
* Function rename
* Remove scoring, update tests
* PR feedback
* Fix return
* small adjustments
* fix tests
* add e2e test
* fix release notes
* fix tests
* fix e2e
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-09-06 17:31:42 +02:00
ZanSara
10d6886255
chore: move PromptBuilder in builders ( #5729 )
2023-09-06 11:52:21 +02:00
bogdankostic
639f7cf888
chore: Rename AnswersBuilder
to AnswerBuilder
( #5720 )
...
* Add AnswersBuilder
* Add tests for AnswersBuilder
* Add release note
* PR feedback
* Fix mypy
* Remove redundant check for number of groups
* Rename AnswersBuilder to AnswerBuilder
* Update test/preview/components/builders/test_answer_builder.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Rename reno file
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-05 14:34:22 +02:00
Silvano Cerza
2acc41ea85
Add PromptBuilder
( #5713 )
...
* Add PromptBuilder
* Update release note
* Add test
2023-09-05 12:22:21 +02:00
bogdankostic
a5b815690e
feat: Add AnswersBuilder
component (2.0) ( #5701 )
...
* Add AnswersBuilder
* Add tests for AnswersBuilder
* Add release note
* PR feedback
* Fix mypy
* Remove redundant check for number of groups
* docstrings upd
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-09-04 21:16:20 +02:00
bogdankostic
11440395f4
fix: Set model_max_length in the Tokenizer of DefaultPromptHandler
( #5596 )
...
* Set model_max_length in tokenizer in prompt handler
* Add release note
2023-09-01 11:48:41 +02:00
ZanSara
5f1256ac7e
feat: generators
(2.0) ( #5690 )
...
* add generators module
* add tests for module helper
* reno
* add another test
* move into openai
* improve tests
2023-08-31 17:33:12 +02:00
Fanli Lin
40d9f34e68
feat: enable passing use_fast
to the underlying transformers' pipeline ( #5655 )
...
* copy instead of deepcopy
* fix pylint
* add use_fast
* add release note
* remove unrelevant changes
* black fix
* fix bug
* black
* bug fix
2023-08-30 10:25:18 +02:00
ZanSara
b1daa7c647
chore: migrate to canals==0.7.0
( #5647 )
...
* add default_to_dict and default_from_dict placeholders to ease migration to canals 0.7.0
* canals==0.7.0
* whisper components
* add to_dict/from_dict stubs
* import serialization methods in init to hide canals imports
* reno
* export deserializationerror too
* Update haystack/preview/__init__.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* serialization methods for LocalWhisperTranscriber (#5648 )
* chore: serialization methods for `FileExtensionClassifier` (#5651 )
* serialization methods for FileExtensionClassifier
* Update test_file_classifier.py
* chore: serialization methods for `SentenceTransformersDocumentEmbedder` (#5652 )
* serialization methods for SentenceTransformersDocumentEmbedder
* fix device management
* serialization methods for SentenceTransformersTextEmbedder (#5653 )
* serialization methods for TextFileToDocument (#5654 )
* chore: serialization methods for `RemoteWhisperTranscriber` (#5650 )
* serialization methods for RemoteWhisperTranscriber
* remove patches
* Add default to_dict and from_dict in document stores built with factory (#5674 )
* fix tests (#5671 )
* chore: simplify serialization methods for `MemoryDocumentStore` (#5667 )
* simplify serialization for MemoryDocumentStore
* remove redundant tests
* pylint
* chore: serialization methods for `MemoryRetriever` (#5663 )
* serialization method for MemoryRetriever
* more tests
* remove hash from default_document_store_to_dict
* remove diff in factory.py
* chore: serialization methods for `DocumentWriter` (#5661 )
* serialization methods for DocumentWriter
* more tests
* use factory
* black
---------
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-29 18:15:07 +02:00
bogdankostic
07c85905f3
fix: Change use_auth_token to token in TransformersQueryClassifier ( #5659 )
2023-08-29 15:21:25 +02:00
Vladimir Blagojevic
f13b37db24
fix: LinkContentFetcher - when no content retrieved (i.e. request blocked), default to snippet text ( #5656 )
...
* When no content retrieved (i.e. request blocked), default to snippet
* Add release note
2023-08-29 10:57:47 +02:00
Stefano Fiorucci
72fe4fc57b
feat: SentenceTransformersDocumentEmbedder ( #5606 )
...
* first draft
* incorporate feedback
* some unit tests
* release notes
* real release notes
* refactored to use a factory class
* allow forcing fresh instances
* first draft
* Update haystack/preview/embedding_backends/sentence_transformers_backend.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* simplify implementation and tests
* add embed_meta_fields implementation
* lg update
* improve meta data embedding; tests
* support non-string metadata
* make factory private
* change return type; improve tests
* warm_up not called in run
* fix typing
* rm unused import
* Remove base test class
* black
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 16:23:41 +02:00
Stefano Fiorucci
89c1813d9f
feat: SentenceTransformersTextEmbedder ( #5600 )
...
* first draft
* incorporate feedback
* some unit tests
* release notes
* real release notes
* first draft
* refactored to use a factory class
* adapt to new ST Embedding Backend implementation
* allow forcing fresh instances
* add tests
* release notes
* fix typo
* little improvements in tests
* Update haystack/preview/embedding_backends/sentence_transformers_backend.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* simplify implementation and tests
* lg update
* input check
* better error message
* make factory private
* change return type; improve tests
* warm_up not called in run
* warm_up not called in run
* rm unused import; default model
* fix typing
* rm unused import
* Remove BaseTestComponent
* black
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 16:23:26 +02:00
Stefano Fiorucci
35dfe47186
feat: SentenceTransformersEmbeddingBackend (v2) ( #5572 )
...
* first draft
* incorporate feedback
* some unit tests
* release notes
* real release notes
* refactored to use a factory class
* allow forcing fresh instances
* Update haystack/preview/embedding_backends/sentence_transformers_backend.py
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* simplify implementation and tests
* make factory private
* change return type; improve tests
* fix typing
* rm unused import
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 12:32:37 +02:00
Silvano Cerza
66f615a3a4
Remove BaseTestComponent ( #5613 )
...
* Remove BaseTestComponent
* Add release notes
2023-08-23 17:03:37 +02:00
Silvano Cerza
4ef813fc8a
Remove specialised Pipeline
( #5584 )
...
* Remove Pipeline
* Add release notes
* Enhance imports
* Update release note
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Remove Pipeline tests
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-18 17:48:13 +02:00
Silvano Cerza
72e0a588db
Rework DocumentWriter
( #5583 )
...
* Remove DocumentStoreAwareMixin from DocumentWriter
* Add release notes
2023-08-18 17:03:17 +02:00