2358 Commits

Author SHA1 Message Date
tstadel
9611b64ec5
fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885)
* fix document retrieval metrics for all document_relevance_criteria

* fix tests

* fix eval_batch metrics

* small refactorings

* evaluate metrics on label level

* document retrieval tests added

* fix pylint

* fix test

* support file retrieval

* add comment about threshold

* rename test
2023-02-02 15:00:07 +01:00
Silvano Cerza
e62d24d0eb
ci: Add linting of workflow and related pre-commit hook (#4032)
* Add actionlint pre-commit hook

* Add workflow to lint workflows

* Remove unused input in Python Cache action

* Move from deprecated set-output syntax to new one

* Add actionlint config to specify self-hosted runners labels
2023-02-02 14:33:23 +01:00
Massimiliano Pippi
2878c57645
Update pyproject.toml (#4035) 2023-02-02 11:59:17 +01:00
Silvano Cerza
d79d39b28a
Bump act10ns/slack from v1 to v2 (#4031) 2023-02-02 09:39:36 +01:00
Silvano Cerza
938cb62144
Fix PyPi release workflow (#4029) 2023-02-02 09:36:23 +01:00
Zoltan Fedor
3aa6522564
fix: Event sending for RayPipeline crashing Haystack (#3971)
* Remove the `send_pipeline_event_if_needed()` to confirm fix

* Suspending evnet sending for RayPipelines as it is not compatible

* Update base.py

* Updating implementation based on feedback from @masci

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-02 08:27:20 +01:00
ZanSara
9009a9ae58
feat: add Shaper (#3880)
* Shaper initial version

* Inital pydoc

* Add more unit tests

* Fix pydoc, expand Shaper pydoc with YAML example

* Minor fix

* Improve pydoc

* More unit tests with prompt node

* Describe Shaper functions in pydoc

* More pydoc

* Use pytest.raises instead of catching errors

* Improve test_function_invocation_order unit test

* pylint fixes

* Improve run_batch handling

* simpler version, initial stub

* stubbing tests

* promptnode compatibility

* add tests

* simplify

* fix promptnode tests

* pylint

* mypy

* fix corner case & mypy

* mypy

* review feedback

* tests

* Add lg updates

* add rename

* pylint

* Add complex unit test with two PNs and ICMs in between (#3921)

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>

* docstring

* fix tests

* add join_lists

* add documents_to_strings

* fix tests

* allow lists of input values

* doc review feedback

* do not use locals()

* Update with minor lg changes

* fix corner case in ICM

* fix merge

* review feedback

* answers conversions

* mypy

* add tests

* generative answers

* forgot to commit

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-01 18:36:13 +01:00
Silvano Cerza
e8ff48094b
Automate release on PyPi (#4015) 2023-02-01 17:40:21 +01:00
Julian Risch
3fcfc8eb23
chore: add discord badge to readme (#4027) 2023-02-01 16:59:22 +01:00
Sebastian
7b3d7ee83a
Reuse tokenizer instead of loading new one. (#4016)
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2023-02-01 10:44:18 +01:00
Sebastian
96706e9e7b
proposal: TableCell (#3875)
* Initial commit for TableSpan proposal

* Updating the proposal

* More updates to the proposal

* More changes

* Rename of file per Proposal instructions

* Update link

* Adding drawbacks

* Fixing typos

* Changed TableSpan to TableCell and updated proposal based on discussions.

* Adding discussion on identified bug.

* Rename proposal to reflect name change made during discussion. Added point to make it clear that we will be able to return a List of TableCells

* Update proposal with discussion about storing table as a list of lists

* Adding some additional code change descriptions.
2023-02-01 09:08:12 +01:00
tstadel
8002cf92d6
fix: extend schema for prompt node results (#3891)
* extend schema for prompt node results

* extend schema

* update openapi

* fix mypy for test module

* added 1.14 specs

* reverted schema for 1.13

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-01-31 16:31:33 +01:00
Julian Risch
c855e18d78
fix: prevent posthog from sending errors to stderr (#4008) 2023-01-31 11:02:47 +01:00
Zoltan Fedor
2b1849f525
fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 (#3898)
* fix: Add a verbose option to PromptNode to let users understand the prompts being used #2

* Add comments and refactoring todo note

* Fix logging-fstring-interpolation pylint

* Update haystack/nodes/prompt/prompt_node.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-31 09:33:47 +01:00
Massimiliano Pippi
378a3fd2e7
chore: add topic:* labels automatically whenever possible (#3997)
* add topics:* labels automatically whenever possible

* address review comments
2023-01-30 20:13:06 +01:00
Silvano Cerza
5f29c83e62
Delete Docker images after testing to prevent workflow failure (#4004) 2023-01-30 17:57:35 +01:00
Sebastian
249398d806
fix: Update telemetry to not serialize Pipeline if disabled. (#4000)
* Update telemetry to not serialize Pipeline if disabled.

* Also disabled telemetry sending event in run_async in the RayPipeline since RayPipeline cannot be serialized currently.
2023-01-30 16:58:43 +01:00
bogdankostic
1a8fe0031d
feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore (#3969)
* Add `use_prefiltering` parameter

* Adapt doc string

* Pass use_prefiltering via API to dC

* Adapt doc string

* Adapt test
2023-01-30 15:12:34 +01:00
Silvano Cerza
b4c5bb7de4
Simplifies and fix docker images tests on release (#3982) 2023-01-30 14:48:47 +01:00
ZanSara
d0d960745d
test: CI on py3.8 (#3926)
* test ci on py3.8

* fix mypy on windows

* typing and default value of "save_to_remote"
2023-01-30 14:41:02 +01:00
Daniel Bichuetti
3009ac2988
feat: Add page range support to PDF converters. (#3965)
* feat: add start and eng page to PDF converters

* docs: add missing docstrings

* refactor: change list set up, add docstrings and comment

* fix: add missing parameter

* tests: add page range basic test

* tests: test correct page numbers

* tests: remove OCR page range test
*Poppler and Tesseract not installed on CI

* fix: remove mobile change error
2023-01-30 14:09:22 +01:00
ZanSara
e4c65dff40
Missing import for TransformersImageToText (#3984)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-30 17:36:49 +05:30
Sebastian
71de0524de
fix: fixed InMemoryDocumentStore.get_embedding_count to return correct number (#3980)
* Fix the embedding count function of InMemoryDocumentStore

* Adding some doc strings explaining how many docs with embeddings to expect.
2023-01-30 12:38:30 +01:00
Mayank Jobanputra
fa17f0973e
chore: increased timeout for loading pipelines through API (#3977)
* increased timeout

* Added comment for users to increase timeout while using docker compose file

* changed the comment with appropriate msg

* changed the comment indent

* changed the indent again
2023-01-30 11:30:47 +01:00
hsm207
08ec059b14
refactor: use weaviate client to build BM25 query (#3939)
* refactor: use weaviate client to build BM25 query

* refactor: remove manual BM25 query building

* refactor: apply BM25 to the content_field only

* test: update weaviate BM25 retrieval test case

update to account for lack of stemming

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-30 10:07:07 +01:00
Massimiliano Pippi
a0d7817dd5
pin weaviate version (#3983) 2023-01-27 18:14:12 +01:00
Massimiliano Pippi
1ee9f51f27
make the benchmark workflow run only manually (#3962) 2023-01-27 16:50:05 +01:00
Massimiliano Pippi
5e0de4a9ed
do not run launch_es in the CI (#3981) 2023-01-27 16:43:17 +01:00
Silvano Cerza
04342124d0
Update Crawler docstring for correct usage in Google colab (#3979) 2023-01-27 16:11:28 +01:00
Agnieszka Marzec
8da9bd7088
Align with the docs install guide + correct lg (#3950)
* Align with the docs install guide + correct lg

* Address Tuana's comments

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-27 15:44:39 +01:00
Tuana Celik
93312138de
fix: removing code block in MarkdownConverter (#3960)
* first attempt to add frontmatter of markdown to the metadata

* remove bug fix

* running black and pre-commit

* moving the import line

* adding a test

* adding pydoc

* fix to removing code blocks in markdown converter

* adding a test

* fixing a test

* improving tests

* adding language to code block
2023-01-27 15:25:54 +01:00
Vladimir Blagojevic
5678f2b1d9
PromptNode doesn't have run_batch support (yet) (#3972) 2023-01-27 15:13:26 +01:00
Tuana Celik
e1502c8029
Adding Example Scripts to Haystack (#3588)
* add 2 example scripts

* fixing faq script

* updating PR based on comments

* black

* updating s3 buckets

* first attempt at testing

* Add basic tests to two scripts

PR: #3588

* make tests runnable

* reformat files

* only run in PRs touching an example

Co-authored-by: bilgeyucel <bilgeyucel96@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-27 14:54:59 +01:00
Agnieszka Marzec
f6a99b6ebc
Fix: Fix quotation marks (#3973)
* Fix quotation marks

* Fix the order
2023-01-27 13:32:52 +01:00
Agnieszka Marzec
95668df92c
Docs: Csvconverter docstrings update (#3974)
* Add missing docstrings

* Blackify

* Update haystack/nodes/file_converter/csv.py

Co-authored-by: Sebastian <sjrl@users.noreply.github.com>

* mark some fields as unused

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-01-27 12:10:46 +01:00
Silvano Cerza
7a36ccf3e2
Fix docker image testing on release (#3976) 2023-01-27 12:05:29 +01:00
Agnieszka Marzec
7937ef8995
Add csvconverter to API docs (#3968) 2023-01-27 11:42:22 +01:00
Daniel Bichuetti
8efdac146d
feat: allow remote api timeout setup (#3949) 2023-01-27 11:31:04 +01:00
Silvano Cerza
a05836589b
ci: Add Docker images testing (#3943)
* Fix typo in Dockerfile.base ARG

* Add workflow to test Docker images

* Fix base image name

* Simplified Docker images testing

* Fix wrong command to retrieve current version

Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-27 09:48:05 +01:00
Agnieszka Marzec
88650c9b0a
Add imgtotext api doc (#3966) 2023-01-27 09:07:53 +01:00
Agnieszka Marzec
2564e47acf
Docs: Update ImageToText docstrings (#3963)
* Update docstrings

* Add missing full stop
2023-01-27 08:31:29 +01:00
Tuana Celik
66dc7f6739
Fixing twitter badge (#3934)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-26 18:27:54 +01:00
Tuana Celik
790e9acd3e
feat: add frontmatter to meta in MarkdownConverter (#3953)
* first attempt to add frontmatter of markdown to the metadata

* remove bug fix

* running black and pre-commit

* moving the import line

* adding a test

* adding pydoc
2023-01-26 17:15:02 +01:00
Massimiliano Pippi
7f6ed941d4
chore: bump pydoc-markdown version used in the CI (#3955)
* use latest pydoc-markdown

* make the workflow manually actionable

* Apply suggestions from code review

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-01-26 16:58:43 +01:00
Stefano Fiorucci
2bbe11b598
fix: overwrite params with environment variables even if there are no params in the pipeline definition; make mypy ignore REST API tests (#3930)
* fix and new test

* make mypy ignore rest_api tests files

* try to improve mypy action

* retry

* fix

* test new action

* ok

* check python files not in root

* really check files!
2023-01-26 16:14:58 +01:00
Massimiliano Pippi
52b195faf6
increase the timeout for testing (#3957) 2023-01-26 16:04:43 +01:00
Silvano Cerza
44934839a7
ci: Remove mypy deps install step in python_cache action (#3956)
* Remove mypy deps install step in python_cache action

* Remove step caching mypy dependencies

* Add ignore files in changed files retrieval step
2023-01-26 14:17:34 +01:00
Vladimir Blagojevic
ec85207cf7
Remove __eq__ and __hash__ from PromptNode (#3923) 2023-01-26 13:38:35 +01:00
bogdankostic
addebcd256
fix: Fix type in FARMReader's save_to_remote (#3952) 2023-01-26 12:27:35 +01:00
Vladimir Blagojevic
b945eaeabd
PromptNode: expose output_variable, adjust unit tests (#3892) 2023-01-26 11:01:11 +01:00