Massimiliano Pippi
d16ec87afb
bump version
2023-11-24 17:03:40 +01:00
Massimiliano Pippi
aeb5708585
Update README.md ( #6408 )
2023-11-24 16:21:19 +01:00
Silvano Cerza
9338de1790
Add missing tests workflow dependency
2023-11-24 16:00:59 +01:00
Massimiliano Pippi
9a8bef63c9
move snippets up one folder
2023-11-24 15:54:23 +01:00
Massimiliano Pippi
763d2d8e4c
remove rest_api
2023-11-24 15:49:54 +01:00
Massimiliano Pippi
d3ab8afede
clean up labeller
2023-11-24 15:30:06 +01:00
Massimiliano Pippi
4a1fe163b6
fix names in workflows
2023-11-24 14:59:31 +01:00
Silvano Cerza
e6637f5ec2
Fix all tests
2023-11-24 14:48:43 +01:00
Massimiliano Pippi
bbb6025e89
update package name
2023-11-24 12:14:43 +01:00
Massimiliano Pippi
ea1e3f588b
Update dependencies list
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 12:09:47 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00
Massimiliano Pippi
f71e11c717
Removed preview package
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:49:41 +01:00
Massimiliano Pippi
09e7831f60
clean up 1.x code
...
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:47:47 +01:00
Silvano Cerza
fd16ec63cb
refactor: Add support for new filters declaration ( #6397 )
...
* Rework filter logic for InMemoryDocumentStore to support new filters
declaration
* Fix legacy filters tests
* Simplify logic and handle dates comparison
* Rework MetadataRouter to support new filters
* Update docstrings
* Add release notes
* Fix linting
* Avoid duplicating filters specifications
* Handle corner case
* Simplify docstring
* Fix filters logic and tests
* Fix Document Store testing legacy filters tests
2023-11-24 11:22:46 +01:00
SebastjanPrachovskij
28c2b09d90
Add SearchApi integration for websearch ( #6400 )
2023-11-24 11:18:43 +01:00
Agnieszka Marzec
27cf8ee4ff
Docs: Update Reader's doc strings ( #6312 )
...
* Update doc strings
* Add warm_up docs, fix margins
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-11-24 11:07:02 +01:00
Stefano Fiorucci
b850b36a4b
fix: Cohere - better handling of COHERE_API_URL
( #6407 )
...
* extract API URL from lazy import
* improve solution
2023-11-24 10:58:46 +01:00
ZanSara
c45d8c39c7
fix: make ExtractiveReader
handle situations where token_to_chars
returns None instead of a (start, end) tuple ( #6382 )
...
* fix reader bug
* add test
* log
* fix logging
* improve error message
2023-11-24 09:08:56 +01:00
ZanSara
f3b73030a1
Fix wrong import in cohere.py
and change model
to model_name
for consistency ( #6405 )
...
* Fix wrong import in `cohere.py`
* model -> model_name
* fix tests too
* black
* typo
* typo
2023-11-23 19:54:50 +01:00
Stefano Fiorucci
fdae81eee8
add pptx to API reference ( #6404 )
2023-11-23 18:02:31 +01:00
pandasar13
edb40b6c1b
refactor: add batch_size to FAISS __init__ ( #6401 )
...
* refactor: add batch_size to FAISS __init__
* refactor: add batch_size to FAISS __init__
* add release note to refactor: add batch_size to FAISS __init__
* fix release note
* add batch_size to docstrings
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2023-11-23 17:27:24 +01:00
ZanSara
4ec6a60a76
feat: CohereGenerator
( #6395 )
...
* added CohereGenerator with unit tests
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. added releasenote
2. removed commented files in test-cohere_generators
3. removed unused imports
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. move client creation to __init__
2. remove dict casting of metadata in run
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* few fixes
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* add cohere to git workflows
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. CohereGenerator as top level import in generators
2. small change in doc string
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. corrected git workflow files for cohere import
2. changed api key env var from CO_API_KEY to COHERE_API_KEY
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* added cohere in missed out workflow installs
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* 1. Removed default_streaming_callback from cohere.py and added in test.
2. Added kwargs doc strings for CohereGenerator
3. removed type hints for metadata and replies
4. use COHERE_API_URL instead of hard coded URL.
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* Update haystack/preview/components/generators/cohere/cohere.py
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
* move out of folder
* black
* fix tests
* feedback
* black
* remove api key from tests
* read api key from env var if missing
* typo
* black
* missing import
---------
Signed-off-by: sunilkumardash9 <sunilkumardash9@gmail.com>
Co-authored-by: sunilkumardash9 <sunilkumardash9@gmail.com>
Co-authored-by: Daria Fokina <daria.f93@gmail.com>
2023-11-23 17:21:07 +01:00
Julian Risch
67780a62d5
test: Add end-to-end test for dense doc search 2.0 ( #6102 )
...
* draft e2e test for dense doc search
* fix import path
* add DocumentJoiner
* update converter import; fix getting filled doc store
* add text embedder
* add sample txt and pdf for preview e2e tests
* run the query pipeline before serializing
* define samples path
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-23 16:59:02 +01:00
jlonge4
c44e2cf49b
feat: add microsoft pptx file converter ( #6399 )
...
* Create pptx.py
* feat: pptx converter import __init__.py
* feat: add pptx import __init__.py
* feat: add python-pptx dependency
* feat: add sample pptx for testing
* feat: add pptx file-converter test
* feat: release note pptx-file-converter-3e494d2747637eb2.yaml
* feat: Update releasenotes/notes/pptx-file-converter-3e494d2747637eb2.yaml
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* feat: refactor haystack/nodes/file_converter/pptx.py
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* fix imports
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
2023-11-23 16:46:41 +01:00
Silvano Cerza
604b177788
chore: Remove pydoc-markdown
from dev dependencies ( #6398 )
...
* Remove pydoc-markdown from dev dependencies
* Remove fastapi pin in rest_api
2023-11-23 15:59:41 +01:00
Stefano Fiorucci
b0b514778d
fix!: make PyPDFToDocument
JSON-serializable ( #6396 )
...
* add registry
* release not
* add checks
* rm superflous check
* fix typo
* rm print :-)
2023-11-23 15:37:20 +01:00
Ben Heckmann
a492771b4d
feat: PreProcessor split by token (tiktoken & Hugging Face) ( #5276 )
...
* #4983 implemented split by token for tiktoken tokenizer
* #4983 added unit test for tiktoken splitting
* #4983 implemented and added a test for splitting documents with HuggingFace tokenizer
* #4983 added support for passing HF model names (instead of objects) and added an example to the HF token splitting test
* mocked HTTP model loading in unit tests, fixed pylint error
* fix lossy tokenizers splitting, use LazyImport, ignore UnicodeEncodeError for tiktoken
* reno
* rename reno file
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-11-23 12:26:37 +01:00
Vladimir Blagojevic
e04a1f16bb
feat: Add DynamicPromptBuilder to Haystack 2.x ( #6328 )
...
* Add DynamicPromptBuilder
* Improve pydocs, add unit tests
* Add release note
* Make expected_runtime_variables optional
* Add pydocs usage example
* Add more pydocs
* Remove test markers
* Update type in unit test
* Update after canals upgrade
* add to api ref
* docstrings updates
* Update test/preview/components/builders/test_dynamic_prompt_builder.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Update haystack/preview/components/builders/dynamic_prompt_builder.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
* Deparametrize init test
* Rename expected_runtime_variables to runtime_variables
* Rephrase docstring so meaning is clearer
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-23 11:41:57 +01:00
Vladimir Blagojevic
e57a593d2e
fix: Revert back to straightforward PromptBuilder ( #6335 )
...
* Revert back to simple PromptBuilder
* Updating to full typing
2023-11-23 11:34:06 +01:00
Silvano Cerza
3e79de7043
ci: Add workflow to test code snippets ( #6364 )
...
* initial
* Add workflow to test code snippets
---------
Co-authored-by: Timo Möller <timo.moeller@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-11-23 11:26:53 +01:00
Timo Moeller
b34c35d982
initial ( #6355 )
2023-11-23 10:32:54 +01:00
Vladimir Blagojevic
cfff0d5212
Rename file_converters to converters ( #6390 )
2023-11-23 10:28:40 +01:00
Vladimir Blagojevic
b557f3035e
feat: Add ConditionalRouter
Haystack 2.x component ( #6147 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-11-23 10:28:08 +01:00
Massimiliano Pippi
70e40eae5c
fix: fix type hints on DocumentStore
protocol ( #6383 )
...
* fix type hints
* disable specific pylint checker
2023-11-23 09:14:08 +01:00
Stefano Fiorucci
e91f7a8a4d
refactor!: improve the public interface of Generators ( #6374 )
...
* merge lazy import blocks
* refactor generators
* release note
* revert unrelated changes
2023-11-22 10:40:48 +01:00
ZanSara
b751978d65
Extends input types of RemoteWhisperTranscriber
( #6218 )
...
* fix tests
* reno
* tests
* retain file name
* paths are strings for openai sdk
* streams->sources
* feedback
* always add name to file
* mypy
* test placeholder with extension
* fallback
* paths
* path test
* path must be a string
* fix test
2023-11-22 09:57:45 +01:00
Ashwin Mathur
e6c8374562
feat: Add ByteStream
metadata and other metadata to Documents
created by HTMLToDocument
( #6304 )
...
* Refactor HTMLToDocument
* Add release notes
* Add additional tests
* remove progress bar
* Add additional test for metadata
* remove progress bar from release notes
* Update tests
* Use truthiness checks instead of is not None
2023-11-21 21:44:02 +01:00
Silvano Cerza
76165d024f
Fix corner cases and error handling with filters conversion ( #6376 )
2023-11-21 18:22:48 +01:00
Stefano Fiorucci
456902235a
feat: make DocumentWriter
return the actual number of documents written ( #6366 )
...
* make DocumentWriter return the actual number of documents written
* add/improve tests
2023-11-21 15:54:25 +01:00
Silvano Cerza
ec3558021e
Remove Document Store tests with invalid filter ( #6375 )
2023-11-21 15:08:16 +01:00
Silvano Cerza
0a5b37f3d1
Rework legacy filters embedding tests and remove numpy dependency ( #6371 )
2023-11-21 14:02:15 +01:00
Daniel Fleischer
0cef17ac13
feat: embedding instructions for dense retrieval ( #6372 )
...
* Embedding instructions in EmbeddingRetriever
Query and documents embeddings are prefixed with instructions, useful
for retrievers finetuned on specific tasks, such as Q&A.
* Tests
Checking vectors 0th component vs. reference, using different stores.
* Normalizing vectors
* Release notes
2023-11-21 12:56:40 +01:00
Julian Risch
07cda09aa8
docs: Include TextEmbedder in DocumentJoiner usage example ( #6369 )
...
* docs: Include TextEmbedder in DocumentJoiner usage example
* black
2023-11-21 11:27:10 +01:00
Stefano Fiorucci
1fff2bc255
merge lazy import blocks ( #6358 )
2023-11-21 11:15:37 +01:00
Julian Risch
2943b83b31
fix: Add DocumentJoiner to routers' init ( #6368 )
2023-11-21 09:45:00 +01:00
Julian Risch
939e443ee8
docs: Add DocumentJoiner to API docs ( #6365 )
2023-11-20 18:18:06 +01:00
Silvano Cerza
d57760787d
refactor: Rework delete_documents
tests ( #6363 )
...
* Rework write_documents tests
* Rework delete_documents tests
* Fix linting
2023-11-20 17:54:42 +01:00
Silvano Cerza
9b0e3f5ed4
Rework write_documents tests ( #6362 )
2023-11-20 17:54:29 +01:00
Silvano Cerza
a7f742fdbd
refactor: Rename docstore
fixture to document_store
( #6360 )
...
* Prevent pytest_generate_tests from polluting preview tests
* Rename docstore fixture to document_store
2023-11-20 17:41:48 +01:00
Silvano Cerza
365127dc5b
Prevent pytest_generate_tests from polluting preview tests ( #6361 )
2023-11-20 15:47:06 +01:00