tstadel
8002cf92d6
fix: extend schema for prompt node results ( #3891 )
...
* extend schema for prompt node results
* extend schema
* update openapi
* fix mypy for test module
* added 1.14 specs
* reverted schema for 1.13
---------
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-01-31 16:31:33 +01:00
Julian Risch
c855e18d78
fix: prevent posthog from sending errors to stderr ( #4008 )
2023-01-31 11:02:47 +01:00
Zoltan Fedor
2b1849f525
fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 ( #3898 )
...
* fix: Add a verbose option to PromptNode to let users understand the prompts being used #2
* Add comments and refactoring todo note
* Fix logging-fstring-interpolation pylint
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-31 09:33:47 +01:00
Massimiliano Pippi
378a3fd2e7
chore: add topic:*
labels automatically whenever possible ( #3997 )
...
* add topics:* labels automatically whenever possible
* address review comments
2023-01-30 20:13:06 +01:00
Silvano Cerza
5f29c83e62
Delete Docker images after testing to prevent workflow failure ( #4004 )
2023-01-30 17:57:35 +01:00
Sebastian
249398d806
fix: Update telemetry to not serialize Pipeline if disabled. ( #4000 )
...
* Update telemetry to not serialize Pipeline if disabled.
* Also disabled telemetry sending event in run_async in the RayPipeline since RayPipeline cannot be serialized currently.
2023-01-30 16:58:43 +01:00
bogdankostic
1a8fe0031d
feat: Add use_prefiltering
parameter to DeepsetCloudDocumentStore
( #3969 )
...
* Add `use_prefiltering` parameter
* Adapt doc string
* Pass use_prefiltering via API to dC
* Adapt doc string
* Adapt test
2023-01-30 15:12:34 +01:00
Silvano Cerza
b4c5bb7de4
Simplifies and fix docker images tests on release ( #3982 )
2023-01-30 14:48:47 +01:00
ZanSara
d0d960745d
test: CI on py3.8 ( #3926 )
...
* test ci on py3.8
* fix mypy on windows
* typing and default value of "save_to_remote"
2023-01-30 14:41:02 +01:00
Daniel Bichuetti
3009ac2988
feat: Add page range support to PDF converters. ( #3965 )
...
* feat: add start and eng page to PDF converters
* docs: add missing docstrings
* refactor: change list set up, add docstrings and comment
* fix: add missing parameter
* tests: add page range basic test
* tests: test correct page numbers
* tests: remove OCR page range test
*Poppler and Tesseract not installed on CI
* fix: remove mobile change error
2023-01-30 14:09:22 +01:00
ZanSara
e4c65dff40
Missing import for TransformersImageToText
( #3984 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-30 17:36:49 +05:30
Sebastian
71de0524de
fix: fixed InMemoryDocumentStore.get_embedding_count
to return correct number ( #3980 )
...
* Fix the embedding count function of InMemoryDocumentStore
* Adding some doc strings explaining how many docs with embeddings to expect.
2023-01-30 12:38:30 +01:00
Mayank Jobanputra
fa17f0973e
chore: increased timeout for loading pipelines through API ( #3977 )
...
* increased timeout
* Added comment for users to increase timeout while using docker compose file
* changed the comment with appropriate msg
* changed the comment indent
* changed the indent again
2023-01-30 11:30:47 +01:00
hsm207
08ec059b14
refactor: use weaviate client to build BM25 query ( #3939 )
...
* refactor: use weaviate client to build BM25 query
* refactor: remove manual BM25 query building
* refactor: apply BM25 to the content_field only
* test: update weaviate BM25 retrieval test case
update to account for lack of stemming
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-30 10:07:07 +01:00
Massimiliano Pippi
a0d7817dd5
pin weaviate version ( #3983 )
2023-01-27 18:14:12 +01:00
Massimiliano Pippi
1ee9f51f27
make the benchmark workflow run only manually ( #3962 )
2023-01-27 16:50:05 +01:00
Massimiliano Pippi
5e0de4a9ed
do not run launch_es in the CI ( #3981 )
2023-01-27 16:43:17 +01:00
Silvano Cerza
04342124d0
Update Crawler docstring for correct usage in Google colab ( #3979 )
2023-01-27 16:11:28 +01:00
Agnieszka Marzec
8da9bd7088
Align with the docs install guide + correct lg ( #3950 )
...
* Align with the docs install guide + correct lg
* Address Tuana's comments
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-27 15:44:39 +01:00
Tuana Celik
93312138de
fix: removing code block in MarkdownConverter
( #3960 )
...
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
* fix to removing code blocks in markdown converter
* adding a test
* fixing a test
* improving tests
* adding language to code block
2023-01-27 15:25:54 +01:00
Vladimir Blagojevic
5678f2b1d9
PromptNode doesn't have run_batch support (yet) ( #3972 )
2023-01-27 15:13:26 +01:00
Tuana Celik
e1502c8029
Adding Example Scripts to Haystack ( #3588 )
...
* add 2 example scripts
* fixing faq script
* updating PR based on comments
* black
* updating s3 buckets
* first attempt at testing
* Add basic tests to two scripts
PR: #3588
* make tests runnable
* reformat files
* only run in PRs touching an example
Co-authored-by: bilgeyucel <bilgeyucel96@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-27 14:54:59 +01:00
Agnieszka Marzec
f6a99b6ebc
Fix: Fix quotation marks ( #3973 )
...
* Fix quotation marks
* Fix the order
2023-01-27 13:32:52 +01:00
Agnieszka Marzec
95668df92c
Docs: Csvconverter docstrings update ( #3974 )
...
* Add missing docstrings
* Blackify
* Update haystack/nodes/file_converter/csv.py
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
* mark some fields as unused
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
Co-authored-by: Sebastian <sjrl@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-01-27 12:10:46 +01:00
Silvano Cerza
7a36ccf3e2
Fix docker image testing on release ( #3976 )
2023-01-27 12:05:29 +01:00
Agnieszka Marzec
7937ef8995
Add csvconverter to API docs ( #3968 )
2023-01-27 11:42:22 +01:00
Daniel Bichuetti
8efdac146d
feat: allow remote api timeout setup ( #3949 )
2023-01-27 11:31:04 +01:00
Silvano Cerza
a05836589b
ci: Add Docker images testing ( #3943 )
...
* Fix typo in Dockerfile.base ARG
* Add workflow to test Docker images
* Fix base image name
* Simplified Docker images testing
* Fix wrong command to retrieve current version
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-27 09:48:05 +01:00
Agnieszka Marzec
88650c9b0a
Add imgtotext api doc ( #3966 )
2023-01-27 09:07:53 +01:00
Agnieszka Marzec
2564e47acf
Docs: Update ImageToText docstrings ( #3963 )
...
* Update docstrings
* Add missing full stop
2023-01-27 08:31:29 +01:00
Tuana Celik
66dc7f6739
Fixing twitter badge ( #3934 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-26 18:27:54 +01:00
Tuana Celik
790e9acd3e
feat: add frontmatter to meta in MarkdownConverter
( #3953 )
...
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
2023-01-26 17:15:02 +01:00
Massimiliano Pippi
7f6ed941d4
chore: bump pydoc-markdown version used in the CI ( #3955 )
...
* use latest pydoc-markdown
* make the workflow manually actionable
* Apply suggestions from code review
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-01-26 16:58:43 +01:00
Stefano Fiorucci
2bbe11b598
fix: overwrite params with environment variables even if there are no params in the pipeline definition; make mypy
ignore REST API tests ( #3930 )
...
* fix and new test
* make mypy ignore rest_api tests files
* try to improve mypy action
* retry
* fix
* test new action
* ok
* check python files not in root
* really check files!
2023-01-26 16:14:58 +01:00
Massimiliano Pippi
52b195faf6
increase the timeout for testing ( #3957 )
2023-01-26 16:04:43 +01:00
Silvano Cerza
44934839a7
ci: Remove mypy deps install step in python_cache action ( #3956 )
...
* Remove mypy deps install step in python_cache action
* Remove step caching mypy dependencies
* Add ignore files in changed files retrieval step
2023-01-26 14:17:34 +01:00
Vladimir Blagojevic
ec85207cf7
Remove __eq__ and __hash__ from PromptNode ( #3923 )
2023-01-26 13:38:35 +01:00
bogdankostic
addebcd256
fix: Fix type in FARMReader's save_to_remote ( #3952 )
2023-01-26 12:27:35 +01:00
Vladimir Blagojevic
b945eaeabd
PromptNode: expose output_variable, adjust unit tests ( #3892 )
2023-01-26 11:01:11 +01:00
github-actions[bot]
d962bc0bc9
Update unstable version and openapi schema ( #3924 )
...
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-26 01:02:49 +05:30
ZanSara
0e471d5e5a
fix: change model in distillation test ( #3944 )
...
* change model
* change layer count
* move promptnode tests in integration
* fix marker
2023-01-25 23:32:11 +05:30
Sebastian
fac72aa14e
Updated schema in rest_api to make returned table format consistent with how it is returned when using Haystacks' Document.to_dict(). ( #3872 )
2023-01-25 23:18:12 +05:30
Daniel Bichuetti
afc1e1ccef
fix: add tiktoken fallback mechanism. ( #3929 )
...
* feat: migrate to tiktoken when tokenizing for OpenAI
* refactor: add OpenAI optional egg
* fix: add Python 3.7 fallback support for tiktoken
* refactor: change both tokenization implementations and fix mypy
* refactor: remove dummy-class
* refactor: add tiktoken as core dependency and minor refactoring
* refactor: sort imports
* refactor: remove out-of-scope PR change
* refactor: reintroduce corner case check
* refactor: remove unused egg
* refactor: remove unused exception after titkoken as core dep
* refactor: reduce ifs and include log warning
* refactor: remove timeout linting ignore
* refactor: revert change due to mypy
* refactor: disable pylint import error
* fix: add arm64 fallback to HF tokenizer
* fix: add aarch64 fallback mechanism
* refactor: improve log message
* fix: change platform selection method
* refactor: consolidate archs
2023-01-25 11:37:29 +01:00
Mayank Jobanputra
5c53b2bd4a
feat: adding secure loading of models by default for haystack ( #3901 )
...
* adding secure loading of models by default
* simplified set function
* testing import effect correctly
* added appropriate log line, adapted the test
* change log string formatting
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* remove extra closing bracket )
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-24 23:01:20 +05:30
Daniel Bichuetti
739fc228c6
feat: support cl100k_base tokenization and increase performance for GPT2 ( #3897 )
...
* feat: migrate to tiktoken when tokenizing for OpenAI
* refactor: add OpenAI optional egg
* fix: add Python 3.7 fallback support for tiktoken
* refactor: change both tokenization implementations and fix mypy
* refactor: remove dummy-class
* refactor: add tiktoken as core dependency and minor refactoring
* refactor: sort imports
* refactor: remove out-of-scope PR change
* refactor: reintroduce corner case check
* refactor: remove unused egg
* refactor: remove unused exception after titkoken as core dep
* refactor: reduce ifs and include log warning
* refactor: remove timeout linting ignore
* refactor: revert change due to mypy
* refactor: disable pylint import error
2023-01-24 16:15:49 +01:00
Vladimir Blagojevic
4d8b1d0b22
refactor: Improve stop_words handling, add unit test cases ( #3918 )
...
* Improve stop_words handling, add unit test cases
* Update test/nodes/test_prompt_node.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-01-24 12:52:41 +01:00
Fabian
61ebe4b5dc
fix: authenticate with aws4auth if set in OpenSearchDocumentStore ( #3741 )
...
* bug(OpenSearchDocumentStore): fix authenticate with aws4auth if set.
Rearrange check to authenticate with aws4auth before username
and password, as the username is set to "admin" by default.
* Make username check less restrictive
* Fix test, do not used mocked _init_client function
* Add warning for aws4auth and username to ElasticSearchDocumentStore
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-01-24 10:01:39 +01:00
ZanSara
e954230ae7
chore: enable f-string-without-interpolation
( #3906 )
...
* f-string-without-interpolation
* remove line
* missed one line
2023-01-23 17:35:52 +01:00
Zoltan Fedor
e447bd728a
feat: adding the ability to use Ray Serve async functionality ( #3769 )
...
* Adding the ability to call the Ray pipeline from concurrent apps with async
This is to fix #2968
* Fixes: mype + pylint (`invalid-overridden-method`)
* Simplifying - no real need for an `AsyncRayPipeline` anymore
* Moving the new `run_async` method to the `RayPipeline`
* Cleanup
* [EMPTY] Re-trigger CI
2023-01-23 16:23:09 +01:00
Benjamin BERNARD
eed009eddb
feat: Add CsvTextConverter
( #3587 )
...
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fixes #3550 , allow user to build full FAQ using YAML pipeline description and with CSV import and indexing.
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fix linter issues mypy and pylint.
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fix linter issues mypy.
* implement proposal's feedback
* tidy up for merge
* use BaseConverter
* use BaseConverter
* pylint
* black
* Revert "black"
This reverts commit e1c45cb1848408bd52a630328750cb67c8eb7110.
* black
* add check for column names
* add check for column names
* add tests
* fix tests
* address lists of paths
* typo
* remove duplicate line
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-01-23 15:56:36 +01:00