2539 Commits

Author SHA1 Message Date
bogdankostic
91b775bf43
Execute pipelines and utils unit tests in CI (#4749) 2023-04-26 10:00:52 +02:00
recrudesce
38768bffdf
fix: Tiktoken does not support Azure gpt-35-turbo (#4739)
* force support for gpt-35-turbo

Cos Tiktoken doesn't support it yet - see https://github.com/openai/tiktoken/pull/72

* Update openai_utils.py

* Appeasing the linting gods

Why hast thou forsaken me ?

* Remove trailing whitespace

* chg: remove redundant elif block
2023-04-25 16:43:24 +02:00
Wang, Yi
2be1a68fce
fix: Allow to set num_beams in HFInvocationLayer (#4731)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-04-25 16:08:06 +02:00
github-actions[bot]
7fa3591f5f
Update unstable version (#4740)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2023-04-24 14:07:50 +02:00
bogdankostic
7db025a97b
Update weaviate-client (#4715) 2023-04-20 17:54:55 +02:00
recrudesce
473152eb05
feat: Add AzureChatGPT Capability using new InvocationLayer style (#4675) 2023-04-20 16:27:07 +02:00
Tuana Çelik
4cc416236d
chore: Updating readme (#4714)
Adding another name in our 'who uses' section
2023-04-20 11:28:38 +02:00
Zoltan Fedor
49d548ef10
fix: Fixing the Weaviate BM25 query builder bug (#4703) 2023-04-20 09:56:49 +02:00
Tuana Çelik
63f24cb1f3
fix: Log 'Observation' on new line (#4704) 2023-04-20 09:53:08 +02:00
bogdankostic
3d3b79986f
docs: Adapt Shaper docstrings regarding dropping metadata (#4655) 2023-04-19 13:40:53 +02:00
Sebastian
8d9136bad4
feat: Implementation of Table Cell Proposal (#4616)
* Starting adding support for TableCell

* Update tests to use row and col

* Added schema test to check to_dict and from_dict works for Table documents. Also updated Doc.__eq__ to work for tables.

* Update eval test to use TableCell

* Added more schema tests for table docs, labels and answers.

* Add boolean to toggle between Span and TableCell

* Add deprecation message

* Test that table answers work as responses in the rest API

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-04-19 13:14:49 +02:00
Darja Fokina
ec7fc4aa0b
docs: add web retriever to api docs (#4699) 2023-04-18 17:19:57 +02:00
Silvano Cerza
f13cc751c3
Block requests_cache in unit tests (#4696) 2023-04-18 16:15:26 +02:00
Massimiliano Pippi
0c081f19e2
fix: remove warnings from the more recent Elasticsearch client (#4602)
* clean up the ES instance in a more robust way

* do not sleep, refresh the index instead

* remove client warnings

* fix unit tests

* fix opensearch compatibility

* fix unit tests

* update ES version

* bump elasticsearch-py

* adjust docs

* use recreate_index param

* use same fixture strategy for Opensearch

* Update lg

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-04-18 15:40:17 +02:00
Sebastian
8c4176bdb2
feat: More flexible routing for RouteDocuments node (#4690)
* Added warning messages for documents that are skipped by RouteDocuments. Begun adding support for new option return_remaining and List of List support for metadata value splitting.

* Simplify _split_by_content_type

* Added new unit test and updated _calculate_outgoing_edges

* Added some TODOs and turned assert into raising an error.

* Update logging messages and make new fixture in tests

* Update _split_by_metadata_values to work with return_remaining

* Remove unneeded code

* Documentation

* Add proper support for list of lists

* Fix mypy errors

* Added assert to make mypy happy

* Update haystack/nodes/other/route_documents.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* PR comments

* Remove check for logging level

* make mypy happy

* Update docstring of metadata_values

* Removed duplicate check. Make explicit check for metadata_values

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-04-18 15:18:13 +02:00
ZanSara
b06821b311
refactor: node->component (#4687)
* node->component

* fix tests
2023-04-17 12:20:42 +02:00
ZanSara
809ca73649
fix: make langdetect truly optional (#4686)
* make al langdetect imports optional

* add workflow

* fix workflow triggers

* change extra name
2023-04-17 11:35:53 +02:00
Fernando Pereira
a0d1733098
fix: PineconeDocumentStore error when delete_documents right after initialization (#4609) 2023-04-17 10:51:39 +02:00
Massimiliano Pippi
a03e8335aa
Ignore cross-reference properties when loading documents (#4664)
* drop cross-reference properties

* be more defensive

* fix regression
2023-04-17 10:40:30 +02:00
Julian Risch
dbe3049682
docs: Add docstring for PromptNode debug attribute (#4672) 2023-04-14 18:09:02 +02:00
Silvano Cerza
79727ed31f
Add requests blocker fixture (#4671) 2023-04-14 18:01:30 +02:00
Vladimir Blagojevic
1dcac11133
feat: Add Hugging Face inferencing PromptNode layer (#4641) 2023-04-14 17:59:17 +02:00
Vladimir Blagojevic
6a5acaa1e2
feat: Add chatgpt streaming (#4659) 2023-04-14 16:02:28 +02:00
Vladimir Blagojevic
1dd6158244
fix: Add model_max_length model_kwargs parameter to HF PromptNode (#4651) 2023-04-14 15:40:42 +02:00
ZanSara
d8ac30fa47
refactor!: extract preprocessing and file conversion deps (#4605)
* isolate file-conversion deps

* pylint

* add to all extra

* chain was missing

* move langdetect into preprocessing and fix tika

* add file-conversion extra
2023-04-14 11:34:16 +02:00
Tuana Çelik
16091f6ad2
Update README.md (#4661)
fixing links to sections
2023-04-14 10:23:06 +02:00
bogdankostic
cb13a537a9
Add deprecation information to doc string (#4658) 2023-04-14 09:39:34 +02:00
ZanSara
174d80ab41
skip tests (#4654) 2023-04-13 17:56:51 +02:00
Agnieszka Marzec
4aca24c845
Docs: Add max length unit to PromptNode API docs (#4601)
* Add max length unit

* Update to token

* Update invocation layers

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-04-13 16:48:32 +02:00
bogdankostic
db48773268
docs: Add PDFToTextOCRConverter to API Docs (#4656) 2023-04-13 15:31:45 +02:00
Joseph Smith
e09b3364c7
Check for date fields in weaviate meta update (#4371)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-04-13 15:18:23 +02:00
Vladimir Blagojevic
e30bc8fe5a
feat: Add GenerationConfig option to PromptNode's HuggingFace invocation layer (#4649) 2023-04-13 12:15:00 +02:00
ZanSara
f2106ab37b
feat: initial implementation of MemoryDocumentStore for new Pipelines (#4447)
* add stub implementation

* reimplementation

* test files

* docstore tests

* tests for document

* better testing

* remove mmh3

* readme

* only store, no retrieval yet

* linting

* review feedback

* initial filters implementation

* working on filters

* linters

* filtering works and is isolated by document store

* simplify filters

* comments

* improve filters matching code

* review feedback

* pylint

* move logic into_create_id

* mypy
2023-04-13 09:36:23 +02:00
Silvano Cerza
db69141642
Fix docstring-labeler.yml not working in PR from forks (#4648) 2023-04-12 21:16:06 +02:00
ZanSara
ba11d1c2a8
refactor!: extract evaluation and statistical dependencies (#4457)
* try-catch sklearn and scipy

* haystack imports

* linting

* mypy

* try to import baseretriever

* remove typing

* unused import

* remove more typing

* pylint

* isolate sql imports for postgres, which we don't use anyway

* remove stats

* replace expit

* als inmemory

* mypy

* feedback

* docker

* expit

* re-add njit
2023-04-12 15:38:56 +02:00
Fernando Pereira
5d41e60d89
fix: ParsrConverter list element added (#4562)
* fix: list element and mapping logic around it added to ParsrConverter convert step + unit test covering the specific mapping of list content from Parsr's to Haystack's

* Code review changes

* changed the samples path after conftest changes

* added samples_path to function arg

---------

Co-authored-by: Namoush <fmpereira22@gmail.com>
Co-authored-by: Fernando Pereira <fernando.pereira@criticalsoftware.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-04-12 18:38:21 +05:30
ZanSara
1ac9ca7fac
merge (#4620) 2023-04-12 09:38:04 +02:00
Silvano Cerza
3d79174eb8
Add 503 as status code that triggers retry in request_with_retry (#4640) 2023-04-11 11:54:53 +02:00
Silvano Cerza
5baf2f5930
refactor: Rework invocation layers (#4615)
* Move invocation layers into separate package

* Fix circular imports

* Fix import
2023-04-11 11:04:29 +02:00
Ben Heckmann
2d65742443
feat: arbitrary crawler_depth for Crawler class (#4623)
* #3674 implemented iterative crawler depth

* #3674 added two tests for increased crawler depth

* removed old comment
2023-04-11 10:39:17 +02:00
Silvano Cerza
5547e85bd5
feat: Add util method to make HTTP requests with configurable retry (#4627)
* Add util method to make HTTP requests with configurable retry

* Fix pylint

* Remove unnecessary optional parameter
2023-04-11 10:35:39 +02:00
Silvano Cerza
5ac3dffbef
test: Rework conftest (#4614)
* Split root conftest into multiple ones and remove unused fixtures

* Remove some constants and make them fixtures

* Remove unnecessary fixture scoping

* Fix failing whisper tests

* Fix image_file_paths fixture
2023-04-11 10:33:43 +02:00
Tuana Çelik
83d33f2aed
Update README.md (#4625) 2023-04-07 09:09:16 +02:00
Malte Pietsch
fabf77388c
Update readme with new companies using haystack (#4621) 2023-04-06 19:42:25 +02:00
Silvano Cerza
e85dc79eaa
test: Add pytest fixture to block requests in unit tests (#4433)
* Add pytest fixture to block requests in unit tests

* Mark test correctly as integration

* Fix crawler unit test failing cause it tries to install chromedriver
2023-04-06 18:04:57 +02:00
Silvano Cerza
c3abf73332
refactor: Rework prompt tests (#4600)
* Rework some PromptNode and PromptModel tests

* Remove duplicate code in PromptNode

* Fix mypy

* Fix test cause of missing fixture

* Revert "Fix mypy"

This reverts commit e530295a06cb260d9a8bd89679534958cb3d9776.

* Revert "Remove duplicate code in PromptNode"

This reverts commit 4a678ae81504dcc78a737372c061d12dc8799639.
2023-04-06 14:47:44 +02:00
Agnieszka Marzec
f2c6ce39e6
Docs: Fix QuestionGenerator and Summarizer docstrings (#4594)
* Add missing params and fix the docstrings

* Add reviewer's comments
2023-04-06 13:40:56 +02:00
Silvano Cerza
ee7b25b8cf
Remove unecessary literal_eval (#4570) 2023-04-06 13:30:45 +02:00
Tuana Çelik
e0895f0ac2
Adding missing emoji (#4613) 2023-04-06 11:20:16 +02:00
Tuana Çelik
1a37caad79
feat: Load documents from remote - helper function (#4545)
* first draft of the load documents from remote function

* resolving comments

* pylint fixes

* pylint fixes

* fixed import

* fixed black

* fixing returned instance

* pythonic list comprehension

* Addressed comments

---------

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-04-06 10:19:35 +02:00