Agnieszka Marzec
2564e47acf
Docs: Update ImageToText docstrings ( #3963 )
...
* Update docstrings
* Add missing full stop
2023-01-27 08:31:29 +01:00
Tuana Celik
66dc7f6739
Fixing twitter badge ( #3934 )
...
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-26 18:27:54 +01:00
Tuana Celik
790e9acd3e
feat: add frontmatter to meta in MarkdownConverter ( #3953 )
...
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
2023-01-26 17:15:02 +01:00
Massimiliano Pippi
7f6ed941d4
chore: bump pydoc-markdown version used in the CI ( #3955 )
...
* use latest pydoc-markdown
* make the workflow manually actionable
* Apply suggestions from code review
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-01-26 16:58:43 +01:00
Stefano Fiorucci
2bbe11b598
fix: overwrite params with environment variables even if there are no params in the pipeline definition; make mypy ignore REST API tests ( #3930 )
...
* fix and new test
* make mypy ignore rest_api tests files
* try to improve mypy action
* retry
* fix
* test new action
* ok
* check python files not in root
* really check files!
2023-01-26 16:14:58 +01:00
Massimiliano Pippi
52b195faf6
increase the timeout for testing ( #3957 )
2023-01-26 16:04:43 +01:00
Silvano Cerza
44934839a7
ci: Remove mypy deps install step in python_cache action ( #3956 )
...
* Remove mypy deps install step in python_cache action
* Remove step caching mypy dependencies
* Add ignore files in changed files retrieval step
2023-01-26 14:17:34 +01:00
Vladimir Blagojevic
ec85207cf7
Remove __eq__ and __hash__ from PromptNode ( #3923 )
2023-01-26 13:38:35 +01:00
bogdankostic
addebcd256
fix: Fix type in FARMReader's save_to_remote ( #3952 )
2023-01-26 12:27:35 +01:00
Vladimir Blagojevic
b945eaeabd
PromptNode: expose output_variable, adjust unit tests ( #3892 )
2023-01-26 11:01:11 +01:00
github-actions[bot]
d962bc0bc9
Update unstable version and openapi schema ( #3924 )
...
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-26 01:02:49 +05:30
ZanSara
0e471d5e5a
fix: change model in distillation test ( #3944 )
...
* change model
* change layer count
* move promptnode tests in integration
* fix marker
2023-01-25 23:32:11 +05:30
Sebastian
fac72aa14e
Updated schema in rest_api to make returned table format consistent with how it is returned when using Haystacks' Document.to_dict(). ( #3872 )
2023-01-25 23:18:12 +05:30
Daniel Bichuetti
afc1e1ccef
fix: add tiktoken fallback mechanism. ( #3929 )
...
* feat: migrate to tiktoken when tokenizing for OpenAI
* refactor: add OpenAI optional egg
* fix: add Python 3.7 fallback support for tiktoken
* refactor: change both tokenization implementations and fix mypy
* refactor: remove dummy-class
* refactor: add tiktoken as core dependency and minor refactoring
* refactor: sort imports
* refactor: remove out-of-scope PR change
* refactor: reintroduce corner case check
* refactor: remove unused egg
* refactor: remove unused exception after titkoken as core dep
* refactor: reduce ifs and include log warning
* refactor: remove timeout linting ignore
* refactor: revert change due to mypy
* refactor: disable pylint import error
* fix: add arm64 fallback to HF tokenizer
* fix: add aarch64 fallback mechanism
* refactor: improve log message
* fix: change platform selection method
* refactor: consolidate archs
2023-01-25 11:37:29 +01:00
Mayank Jobanputra
5c53b2bd4a
feat: adding secure loading of models by default for haystack ( #3901 )
...
* adding secure loading of models by default
* simplified set function
* testing import effect correctly
* added appropriate log line, adapted the test
* change log string formatting
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* remove extra closing bracket )
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-24 23:01:20 +05:30
Daniel Bichuetti
739fc228c6
feat: support cl100k_base tokenization and increase performance for GPT2 ( #3897 )
...
* feat: migrate to tiktoken when tokenizing for OpenAI
* refactor: add OpenAI optional egg
* fix: add Python 3.7 fallback support for tiktoken
* refactor: change both tokenization implementations and fix mypy
* refactor: remove dummy-class
* refactor: add tiktoken as core dependency and minor refactoring
* refactor: sort imports
* refactor: remove out-of-scope PR change
* refactor: reintroduce corner case check
* refactor: remove unused egg
* refactor: remove unused exception after titkoken as core dep
* refactor: reduce ifs and include log warning
* refactor: remove timeout linting ignore
* refactor: revert change due to mypy
* refactor: disable pylint import error
2023-01-24 16:15:49 +01:00
Vladimir Blagojevic
4d8b1d0b22
refactor: Improve stop_words handling, add unit test cases ( #3918 )
...
* Improve stop_words handling, add unit test cases
* Update test/nodes/test_prompt_node.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-01-24 12:52:41 +01:00
Fabian
61ebe4b5dc
fix: authenticate with aws4auth if set in OpenSearchDocumentStore ( #3741 )
...
* bug(OpenSearchDocumentStore): fix authenticate with aws4auth if set.
Rearrange check to authenticate with aws4auth before username
and password, as the username is set to "admin" by default.
* Make username check less restrictive
* Fix test, do not used mocked _init_client function
* Add warning for aws4auth and username to ElasticSearchDocumentStore
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-01-24 10:01:39 +01:00
ZanSara
e954230ae7
chore: enable f-string-without-interpolation ( #3906 )
...
* f-string-without-interpolation
* remove line
* missed one line
2023-01-23 17:35:52 +01:00
Zoltan Fedor
e447bd728a
feat: adding the ability to use Ray Serve async functionality ( #3769 )
...
* Adding the ability to call the Ray pipeline from concurrent apps with async
This is to fix #2968
* Fixes: mype + pylint (`invalid-overridden-method`)
* Simplifying - no real need for an `AsyncRayPipeline` anymore
* Moving the new `run_async` method to the `RayPipeline`
* Cleanup
* [EMPTY] Re-trigger CI
2023-01-23 16:23:09 +01:00
Benjamin BERNARD
eed009eddb
feat: Add CsvTextConverter ( #3587 )
...
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fixes #3550 , allow user to build full FAQ using YAML pipeline description and with CSV import and indexing.
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fix linter issues mypy and pylint.
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fix linter issues mypy.
* implement proposal's feedback
* tidy up for merge
* use BaseConverter
* use BaseConverter
* pylint
* black
* Revert "black"
This reverts commit e1c45cb1848408bd52a630328750cb67c8eb7110.
* black
* add check for column names
* add check for column names
* add tests
* fix tests
* address lists of paths
* typo
* remove duplicate line
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-01-23 15:56:36 +01:00
ZanSara
94f660c56f
feat: store id_hash_keys in Document objects to make documents clonable ( #3697 )
...
* store id_hash_keys in Document objects
* fix id_hash_keys calls throughout codebase
* generate schema
* fix es
* fix weaviate
* backward compatible
* openapi schema
* remove unused deprecation warning
* remove unused imports
* openapi
* unused var
* Apply suggestions from code review
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/schema.py
* Apply suggestions from code review
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/schema.py
* review feedback
* trailing spaces
* pylint
* add deprecation test
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-01-23 15:00:52 +01:00
ZanSara
2f15f3c64d
Fix OpensearchDocumentStore docstring ( #3904 )
2023-01-23 19:19:40 +05:30
Silvano Cerza
afa2bb1386
fix: Remove double super class init from ParsrConverter init ( #3896 )
2023-01-23 12:31:27 +01:00
Silvano Cerza
45bea5a838
chore: Add timeouts to external requests calls ( #3895 )
...
* chore: Add timeouts to external requests calls
* Remove :type directives from docstrings
2023-01-23 12:31:13 +01:00
Stefano Fiorucci
b910df7ec7
feat: ImageToText (caption generator) ( #3859 )
...
* first draft
* fix pylint and mypy
* retry w mypy
* mypy :-)
* rem unused import
* incorporate feedback and initial tests
* better tests
* fix import order
* fix docstring
* other fix docstring
* more and better tests
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-01-23 11:59:56 +01:00
Sebastian
d2bba4935b
feat: Use truncate option for Cohere.embed ( #3865 )
...
* Use truncate option for cohere request instead of GPT2 tokenizer to truncate texts
* Update max batch size for cohere which is 96
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-01-20 09:49:55 +01:00
Vladimir Blagojevic
04deb3b535
feat: Add retry with exponential back-off to PromptNode's OpenAI models ( #3886 )
2023-01-19 21:04:32 +01:00
ZanSara
90c877a559
bug: mypy should ignore files in test/ ( #3894 )
...
* exclude files in test/
* verify that the CI ignores test files
* dont fail in case of no files
2023-01-19 18:12:26 +01:00
Vladimir Blagojevic
4c28253955
feat: PromptNode - implement stop words ( #3884 )
2023-01-19 12:26:15 +01:00
Vladimir Blagojevic
e2fb82b148
refactor: Move invocation_context from meta to own pipeline variable ( #3888 )
2023-01-19 11:17:06 +01:00
ZanSara
34b7db0209
chore: enable singleton-comparison and cleanup ( #3849 )
...
* enable singleton-comparison
* fix triadaptive_model bug
2023-01-19 10:07:41 +01:00
ZanSara
6f5a2fb1da
fix: remove string validation in YAML ( #3854 )
...
* remove string validation in YAML
* unused import
* fix import
* remove tests
* fix tests
2023-01-19 10:06:53 +01:00
Mayank Jobanputra
dad7b12874
fix: Allowing InMemStore and FAISSDocStore for indexing using single worker ( #3868 )
...
* Allowing InMemStore and FAISSDocStore for indexing using single worker YAML config
* unified pipeline & doc store loading
* fix pylint warning
* separated tests
* removed unnecessay caplog
2023-01-19 14:06:00 +05:30
Ahmed Nabil
12e057837b
Adding condition to pinecone object. ( #3768 )
...
* Adding condition to `pinecone` object.
While you can assign any values to `PineconeDocumentStore`'s parameter `pinecone_index`, it must have another condition to prevent that from happening.
* Added test, and changed the code to make sure the pinecone idx variable has correct instance
* fixed black error
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-19 01:34:44 +05:30
Vladimir Blagojevic
c44d67856e
Simplify PromptTemplate substitution in PromptNode ( #3876 )
2023-01-18 18:31:15 +01:00
ZanSara
eb57e1fc09
chore: make Mypy work when Haystack is installed ( #3856 )
...
* add ignore statements to each failing line in haystack/
* simplify workflow
* few typos
* mypy cache directory missing
* mypy cache directory missing
* install types from Haystack only
* install types from rest_api too
* mypy vs literal
* install types at check time
* add mypy cache to python cache
* fix version condition
* fix version condition
* try running mypy only on affected files
* try using explicit hashes
* try another approach
* filter python files
* typo
* quotes
* use action
2023-01-18 15:36:10 +01:00
ZanSara
6af4f14fe0
feat: preprocessor raises warning when doc length exceeds threshold ( #3837 )
...
* add warning for excessive lenght
* improve test
* review feedback
* fix test
* move into _process_single
2023-01-17 13:48:28 +01:00
ZanSara
c50968dfe5
upgrade es to the version used in the CI ( #3858 )
2023-01-17 13:47:37 +01:00
ZanSara
9e457db2e9
test: add version deprecation fixture ( #3851 )
...
* add fixture
* Update test/conftest.py
* remove +2 and add tests
* few typos
* more cases
* Update test/conftest.py
2023-01-16 15:36:14 +01:00
ZanSara
3ffdb0a9a3
chore: fix all EOF ( #3852 )
...
* fix all eof
* fix test
* fix test
* fix test
* typo
* fix sample
* fix sample
* add logs
* fix page_dynamic_result.txt
2023-01-16 12:34:50 +01:00
ZanSara
62935bde6d
enable unused-variable ( #3846 )
2023-01-12 19:38:45 +01:00
Benjamin BERNARD
15203d864b
docs: Proposal - CSV FAQ indexing feature ( #3638 )
...
* docs(proposal): Add new proposal about CSV FAQ indexing feature
* docs(proposal): Add new proposal about CSV FAQ indexing feature
Introduce PR number.
* Review feedback
* Mixed up the PR numbers
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-01-12 11:07:26 +01:00
Zoltan Fedor
9cf80ee07e
feat: add HA support for Weaviate ( #3764 )
...
* feat: add HA support for Weaviate
Adding the `replicationConfig => factor` parameter to the Weaviate class at the time of class creation, allowing the user to have Haystack create a Weaviate "Class" with a replication factor set above 1.
This enables the use of Weaviate in a HA (High Availability) fashion, where the created class is stored on multiple Weaviate nodes increasing Weaviate's throughput and also ensuring high availability.
* Trying out a recommendation from @masci to fix the CI issue
2023-01-12 10:01:38 +01:00
ZanSara
d157e41c1f
chore: enable logging-fstring-interpolation and cleanup ( #3843 )
...
* enable logging-fstring-interpolation
* remove logging-fstring-interpolation from exclusion list
* remove implicit string interpolations added by black
* remove from rest_api too
* fix % sign
2023-01-12 09:31:21 +01:00
ZanSara
4cbc8550d6
chore: enable trailing-whitespace and cleanup ( #3847 )
...
* enable trailing-whitespace
* remove trailing whitespace on rest api too
2023-01-11 20:08:19 +01:00
Massimiliano Pippi
fa4404baa0
fix: ignore non-serializable params when hashing pipeline objects ( #3842 )
...
* ignore non-serializable params when hashing pipeline objects
* make tests more clear
2023-01-11 17:11:41 +01:00
Vladimir Blagojevic
ccda51fb43
proposal: Shaper pipeline component ( #3784 )
...
* Add InputOutputShaper proposal
* Add security section
* Rename to Shaper, small additions
* Rewording, rename contract_docs to concat
2023-01-11 18:50:12 +05:30
Bilge Yücel
88db75a419
feat: update the docker image for haystack-api service ( #3835 )
2023-01-11 15:35:46 +03:00
Stefano Fiorucci
be31178892
fix: make the crawler runnable and testable on Windows ( #3830 )
...
* fix crawler and try to run CI
* more compact expression
* try to fix
* improve naming regex
* revert regex
* make test_url compatible wirh Windows
* better conditional expression
2023-01-10 20:27:28 +01:00