3803 Commits

Author SHA1 Message Date
Bilge Yücel
ee13125e06
Add information about preview module (#5643)
* Add information about `preview` module

* Add discussion link

* Improve text
2023-08-29 15:57:57 +03:00
Vladimir Blagojevic
1f7c7b716a
Update release note for #5526 (#5664) 2023-08-29 14:25:52 +02:00
Julian Risch
fa81c611e8
build: Upgrade transformers to v4.32.1 (#5658)
* upgrade transformers to 4.32.1

* added release notes

* upgrade transformers version also for inference extra
2023-08-29 13:46:00 +02:00
Vladimir Blagojevic
791f322a94
Unpin safetensors (#5657) 2023-08-29 13:12:11 +02:00
ZanSara
5985b6d358
chore: refactor pipeline tests for e2e testing (#5576)
* enable pipeline filder in e2e

* merge standard pipeline tests with stanrdard pipeline batch tests

* merge summarization tests into standard pipelines tests

* Update test_standard_pipelines.py

* black
2023-08-29 11:22:39 +02:00
Vladimir Blagojevic
f13b37db24
fix: LinkContentFetcher - when no content retrieved (i.e. request blocked), default to snippet text (#5656)
* When no content retrieved (i.e. request blocked), default to snippet

* Add release note
2023-08-29 10:57:47 +02:00
Vladimir Blagojevic
2118f68769
feat: Add domain scoping to WebRetriever (#5587)
* WebSearch: add allowed_domains scoped search

* Add talk to website example

* Add release note

* Add allowed_domains to WebSearch

* Minor fix

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-28 20:02:02 +02:00
Massimiliano Pippi
81f3aaf3e5
Add coverage badge (#5634) 2023-08-28 18:30:01 +02:00
ZanSara
55235b09ff
remove self.warm_up() (#5644) 2023-08-28 17:38:56 +02:00
Stefano Fiorucci
72fe4fc57b
feat: SentenceTransformersDocumentEmbedder (#5606)
* first draft

* incorporate feedback

* some unit tests

* release notes

* real release notes

* refactored to use a factory class

* allow forcing fresh instances

* first draft

* Update haystack/preview/embedding_backends/sentence_transformers_backend.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* simplify implementation and tests

* add embed_meta_fields implementation

* lg update

* improve meta data embedding; tests

* support non-string metadata

* make factory private

* change return type; improve tests

* warm_up not called in run

* fix typing

* rm unused import

* Remove base test class

* black

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 16:23:41 +02:00
Stefano Fiorucci
89c1813d9f
feat: SentenceTransformersTextEmbedder (#5600)
* first draft

* incorporate feedback

* some unit tests

* release notes

* real release notes

* first draft

* refactored to use a factory class

* adapt to new ST Embedding Backend implementation

* allow forcing fresh instances

* add tests

* release notes

* fix typo

* little improvements in tests

* Update haystack/preview/embedding_backends/sentence_transformers_backend.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* simplify implementation and tests

* lg update

* input check

* better error message

* make factory private

* change return type; improve tests

* warm_up not called in run

* warm_up not called in run

* rm unused import; default model

* fix typing

* rm unused import

* Remove BaseTestComponent

* black

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 16:23:26 +02:00
Stefano Fiorucci
35dfe47186
feat: SentenceTransformersEmbeddingBackend (v2) (#5572)
* first draft

* incorporate feedback

* some unit tests

* release notes

* real release notes

* refactored to use a factory class

* allow forcing fresh instances

* Update haystack/preview/embedding_backends/sentence_transformers_backend.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* simplify implementation and tests

* make factory private

* change return type; improve tests

* fix typing

* rm unused import

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-28 12:32:37 +02:00
ZanSara
4dda25d67c
proposal: LLM support in Haystack 2.0 (#5540)
* Add proposal

* add pr number

* file name

* clarify input of LLM component

* promptbuilder is tokenizer-aware

* typo

* feedback

* streaming

* Chat API
2023-08-28 10:33:07 +02:00
Silvano Cerza
444edce126
Add workflow to trigger preview package release (#5631) 2023-08-25 17:10:28 +02:00
Stefano Fiorucci
8342b6a457
upgrade transformers (#5619) 2023-08-25 16:38:34 +02:00
totto
7c7a486014
fix: in a containerized environment (like AWS ECS) there is a file wr… (#5499)
* fix: in a containerized environment (like AWS ECS) there is a file write permission error: 	PermissionError: [Errno 13] Permission denied: 'feedback_squad_direct.json'. catch this error.
hint: future solution similar to FILE_UPLOAD_PATH to provide a writeable path in a container.

(cherry picked from commit c54ab7ed2d487e4391c0391be7c3e268ae525507)

* fix linter error: dont use f string in logger message

* reformat

* fix: pylint requires using % in logging message
2023-08-25 13:32:29 +02:00
Silvano Cerza
cb894061f7
Add terminate-runner job in benchmarks.yml (#5611) 2023-08-25 10:14:39 +02:00
Silvano Cerza
66f615a3a4
Remove BaseTestComponent (#5613)
* Remove BaseTestComponent

* Add release notes
2023-08-23 17:03:37 +02:00
Silvano Cerza
d5599df029
Fix release notes (#5599) 2023-08-18 17:59:07 +02:00
Silvano Cerza
b53fad4c4f
Add missing integration tests to catch-all required step in tests.yml (#5598) 2023-08-18 17:58:26 +02:00
Silvano Cerza
03ebef7219
Remove DocumentStoreAwareMixin (#5585)
* Remove Pipeline

* Add release notes

* Enhance imports

* Update release note

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Remove Pipeline tests

* Remove DocumentStoreAwareMixin

* Add release notes

* Remove DocumentStoreAwareMixin from __all__

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-18 17:56:24 +02:00
Silvano Cerza
4ef813fc8a
Remove specialised Pipeline (#5584)
* Remove Pipeline

* Add release notes

* Enhance imports

* Update release note

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Remove Pipeline tests

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-18 17:48:13 +02:00
Silvano Cerza
72e0a588db
Rework DocumentWriter (#5583)
* Remove DocumentStoreAwareMixin from DocumentWriter

* Add release notes
2023-08-18 17:03:17 +02:00
Silvano Cerza
4bc68cbc2f
Rework MemoryRetriever (#5582)
* Remove DocumentStoreAwareMixin from MemoryRetriever

* Add release notes

* Update an article

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-08-18 16:33:35 +02:00
Massimiliano Pippi
011baf492f
leftover from #5580 (#5593) 2023-08-18 12:53:40 +02:00
Massimiliano Pippi
7e633c6b0c
chore: change import paths under preview (#5592)
* fix import paths

* add release notes
2023-08-18 12:53:25 +02:00
Massimiliano Pippi
39a1f61326
chore: improve error message in FileExtensionClassifier (#5590)
* output an actionable error

* add release note

* fix matching in raised error

* fix release note category
2023-08-18 12:28:55 +02:00
Stefano Fiorucci
aa8da40820
chore: add preview section to release notes (#5591)
* add preview section to reno config and update existing notes

* Empty commit to trigger CLA
2023-08-18 09:59:01 +02:00
Vladimir Blagojevic
da67700318
Rename web_lfqa_improved and update questions (#5588) 2023-08-17 17:10:49 +02:00
bogdankostic
ee2745bad8
ci: Add Github workflow to automate benchmark runs (#5399)
* Add config files

* log benchmarks to stdout

* Add top-k and batch size to configs

* Add batch size to configs

* fix: don't download files if they already exist

* Add batch size to configs

* refine script

* Remove configs using 1m docs

* update run script

* update run script

* update run script

* datadog integration

* remove out folder

* gitignore benchmarks output

* test: send benchmarks to datadog

* remove uncommented lines in script

* feat: take branch/tag argument for benchmark setup script

* fix: run.sh should ignore errors

* Add GH workflow to run benchmarks periodically

* Remove unused script

* Adapt cml.yml

* Adapt cml.yml

* Rename cml.yml to benchmarks.yml

* Revert "Rename cml.yml to benchmarks.yml"

This reverts commit 897299433a71a55827124728adff5de918d46d21.

* remove benchmarks.yml

* Use same file extension for all config files

* Use checkout@v3

* Run benchmarks sequentially

* Add timeout-minutes parameter

* Remove changes unrelated to datadog

* Apply black

* use haystack-oss aws account

* Update test/benchmarks/utils.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* PR feedback

* fix aws credentials step

* Fix path

* check docker

* Allow spinning up containers from within container

* Allow spinning up containers from within container

* Separate launching doc stores from benchmarks

* Remove docker related commands

* run only retrievers

* change port

* Revert "change port"

This reverts commit 6e5bcebb1d16e03ba7672be7e8a089084c7fc3a7.

* Run opensearch benchmark only

* Run weaviate benchmark only

* Run bm25 benchmarks only

* Changes host of doc stores

* add step to get docker logs

* Revert "add step to get docker logs"

This reverts commit c10e6faa76bde5df406a027203bd775d18c93c90.

* Install docker

* Launch doc store containers from wtihin runner container

* Remove kill command

* Change host

* dump docker logs

* change port

* Add cloud startup script

* dump docker logs

* add network param

* add network to startup.sh

* check cluster health

* move steps

* change port

* try using services

* check cluster health

* use services

* run only weaviate

* change host

* Upload benchmark results as artifacts

* Update configs

* Delete index after benchmark run

* Use correct index name

* Run only failing config

* Use smaller batch size

* Increase memory for opensearch

* Reduce batch size further

* Provide more storage

* Reduce batch size

* dump docker logs

* add java opts

* Spin up only opensearch container

* Create separate job for each doc store

* Run benchmarks sequentially

* Set working directory

* Account for reader benchmarks not doing indexing

* Change key of reader metrics

* Apply PR feedback

* Remove whitespace

* Adapt workflow to changes in datadog scripts

* Adapt workflow to changes in datadog scripts

* Increase memory for opensearch

* Reduce batch size

* Add preprocessing_batch_size to Readers

* Remove unrelated change

* Move order

* Fix path

* Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

Manually terminate EC2 instance

* Manually terminate EC2 instance

* Manually terminate EC2 instance

* Always terminate runner

* Always terminate runner

* Remove unnecessary terminate-runner job

* Add cron schedule

* Disable telemetry

* Rename cml.yml to benchmarks.yml

---------

Co-authored-by: rjanjua <rohan.janjua@gmail.com>
Co-authored-by: Paul Steppacher <p.steppacher91@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-17 12:56:45 +02:00
Vladimir Blagojevic
46c9139caf
refactor: Rework WebRetriever caching, adjust tests (#5566)
* Rework WebRetriever caching, adjust tests

* Add release note

* Better pydocs

* Minor improvements

* Update haystack/nodes/retriever/web.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-16 17:41:11 +02:00
ZanSara
a8d4a99db9
feat: copy lazy_imports.py to preview (#5580)
* copy lazy_imports

* reno
2023-08-16 14:27:17 +02:00
Julian Risch
22c7601729
feat: Add DocumentWriter v2 (#5435)
* add draft of WriteToStore and basic test

* add DocumentWriter implementation

* draft unit and integration tests

* add release note

* mock Store in unit tests

* pylint

* Update haystack/preview/components/writers/document_writer.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Remove unnecessary test

* Rework DocumentWriter to support new Component I/O definition

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-16 13:48:33 +02:00
Massimiliano Pippi
d4c1a0508a
chore: remove haystack dependencies from preview (#5569)
* provides preview's own implementation of expit

* copy the requests utility over into preview

* remove unnecessary types conversions

* fix mocking paths
2023-08-16 12:45:28 +02:00
MichelBartels
93b3400440
Add Answer class (#5563)
* add answer class

* inheritance instead of composition

* make answer immutable

* Remove probability field for GenerativeAnswer

* rename Answer classes

* fix name change

* add release notes
2023-08-16 11:56:22 +02:00
Vladimir Blagojevic
8652d00b54
feat: Add FileExtensionClassifier to previews (#5514)
* Add FileExtensionClassifier preview component

* Add release note

* PR feedback
2023-08-15 15:58:55 +02:00
Silvano Cerza
bb7af3827d
Update canals to 0.5.0 (#5564)
* Update canals to 0.5.0

* Fix RemoteWhisperTranscriber serialisation
2023-08-14 20:08:34 +02:00
bogdankostic
c26f1e9426
fix: Use correct type for points in datadog (benchmarks) (#5570) 2023-08-14 17:40:36 +02:00
Massimiliano Pippi
f9bd64ba9e
make code layout consistent (#5561) 2023-08-14 16:35:34 +02:00
Tuana Çelik
c38943721f
Update README.md (#5554)
Some minor wording updates to reflect latest use cases and functionality
2023-08-12 08:45:56 +02:00
Massimiliano Pippi
714b944dc2
chore: rename store to document_store for clarity (#5547)
* store -> document_store

* fix leftovers

* fix import name

* moar leftovers

* rebase on main, update MemoryDocumentStore to the new protocol

* Update haystack/preview/pipeline.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-08-12 08:44:36 +02:00
Agnieszka Marzec
e7532c49cf
Add cohere ranker api (#5549) 2023-08-11 17:47:36 +02:00
Silvano Cerza
a7416bcf89
Add to_dict and from_dict methods for Stores (#5541)
* Add to_dict and from_dict methods for Stores

* Add release notes

* Add tests with custom init parameters
2023-08-11 14:45:56 +02:00
Vladimir Blagojevic
094d8578bd
feat: Update Docker readme (#5536)
* Update Docker readme

* Update wording

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-08-11 14:06:12 +02:00
Massimiliano Pippi
d73d443bc0
test: ease testing for 3rd parties (#5539)
* ease testing for 3rd parties

* fix __all__

* uniform error management

* raise the same filter error

* raise the same filter error

* fix circular import
2023-08-10 17:13:15 +02:00
Silvano Cerza
168b7c806c
Add _store_name field to StoreAwareMixin to ease serialisation (#5531) 2023-08-10 15:42:19 +02:00
Tuana Çelik
4bb22c9665
Update weaviate.py (#5469)
Updating the weaviate docstrings to replace the old URL with the new correct one. The old one now gives a 404
2023-08-10 15:37:55 +02:00
Vladimir Blagojevic
a75b9dd4bb
feat: LinkContentFetcher - add content-type resolution, user agent switching, PDF handler (#5374)
* Add content type resolution, pdf handler, user agent switching
---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-08-09 18:14:04 +02:00
Stefano Fiorucci
52133d3a81
proposal: Embedders design (#5390)
* first draft

* rename

* refinements

* added clarifications

* improvements

* improvements

* improvements

* further improvements

* fix typo

* Apply suggestions from code review

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* adapt to new Canals I/O

* fix links to previous proposals

* fix

* add migration example: update_embeddings

* rename EmbeddingService to EmbeddingBackend

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-08-09 17:09:30 +02:00
ZanSara
5ca4874df9
Migrate existing v2 components to Canals 0.4.0 (#5532)
* pin canals==0.4.0

* update audio components

* allow audio components to receive whisper_params in init too

* migrating memoryretriever

* migrate memoryretriever

* migrate TextFileToDocument

* fix TextFileToDocument tests

* fix pipeline tests

* fix defaults management

* reno

* inverted assignments

* Simplify release notes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-08-09 15:51:32 +02:00