Sebastian
01d39df863
feat: Update allowed models to be used with Prompt Node ( #4018 )
...
* Update allowed models to be used with Prompt Node
* Added try except block around the config to skip over OpenAI models.
* Fixing tests
* Adding warning message
* Adding test for different HF models that could be used in prompt node
2023-02-08 12:47:52 +01:00
Stefano Fiorucci
5c009c2a1a
feat: OpenAI - warn users if max_tokens
is too short ( #4094 )
...
* warn users if max_tokens is too short
* skip test if not API KEY
* add counters
* correctly run precommit
2023-02-08 10:39:40 +01:00
tstadel
92c58cfda1
feat: Support multiple document_ids in Answer object (for generative QA) ( #4062 )
...
* initial version without shapers
* set document_ids for BaseGenerator
* introduce question-answering-with-references template
* better prompt
* make PromptTemplate control output_variable
* update schema
* fix add_doc_meta_data_to_answer
* Revert "fix add_doc_meta_data_to_answer"
This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9.
* fix add_doc_meta_data_to_answer
* fix eval
* fix pylint
* fix pinecone
* fix other tests
* fix test
* fix flaky test
* Revert "fix flaky test"
This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775.
* adjust docstrings
* make Label loading backward-compatible
* fix Label backward compatibility for pinecone
* fix Label backward compatibility for search engines
* fix Label backward compatibility for deepset Cloud
* fix tests
* fix None issue
* fix test_write_feedback
* add tests for legacy label support
* add document_id test for pinecone
* reduce unnecessary contents
* add comment to pinecone test
2023-02-08 08:37:22 +01:00
Vladimir Blagojevic
3273a2714d
fix: Add PromptTemplate __repr__ method ( #4058 )
...
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-02-07 08:14:32 +01:00
Jack Butler
f006eded7d
fix: allow Biadaptive & Triadaptive to work with EarlyStopping ( #4033 )
...
* fix: allow str when saving tri/bi-adaptive models
* fix: make trainer model loading class-agnostic
* test: add test for DPR with EarlyStopping
* refactor: simplify model reloading via classmethod
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-02-03 11:13:18 +01:00
tstadel
9611b64ec5
fix: document retrieval metrics for non-document_id document_relevance_criteria ( #3885 )
...
* fix document retrieval metrics for all document_relevance_criteria
* fix tests
* fix eval_batch metrics
* small refactorings
* evaluate metrics on label level
* document retrieval tests added
* fix pylint
* fix test
* support file retrieval
* add comment about threshold
* rename test
2023-02-02 15:00:07 +01:00
ZanSara
9009a9ae58
feat: add Shaper
( #3880 )
...
* Shaper initial version
* Inital pydoc
* Add more unit tests
* Fix pydoc, expand Shaper pydoc with YAML example
* Minor fix
* Improve pydoc
* More unit tests with prompt node
* Describe Shaper functions in pydoc
* More pydoc
* Use pytest.raises instead of catching errors
* Improve test_function_invocation_order unit test
* pylint fixes
* Improve run_batch handling
* simpler version, initial stub
* stubbing tests
* promptnode compatibility
* add tests
* simplify
* fix promptnode tests
* pylint
* mypy
* fix corner case & mypy
* mypy
* review feedback
* tests
* Add lg updates
* add rename
* pylint
* Add complex unit test with two PNs and ICMs in between (#3921 )
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* docstring
* fix tests
* add join_lists
* add documents_to_strings
* fix tests
* allow lists of input values
* doc review feedback
* do not use locals()
* Update with minor lg changes
* fix corner case in ICM
* fix merge
* review feedback
* answers conversions
* mypy
* add tests
* generative answers
* forgot to commit
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-01 18:36:13 +01:00
Zoltan Fedor
2b1849f525
fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 ( #3898 )
...
* fix: Add a verbose option to PromptNode to let users understand the prompts being used #2
* Add comments and refactoring todo note
* Fix logging-fstring-interpolation pylint
* Update haystack/nodes/prompt/prompt_node.py
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-31 09:33:47 +01:00
bogdankostic
1a8fe0031d
feat: Add use_prefiltering
parameter to DeepsetCloudDocumentStore
( #3969 )
...
* Add `use_prefiltering` parameter
* Adapt doc string
* Pass use_prefiltering via API to dC
* Adapt doc string
* Adapt test
2023-01-30 15:12:34 +01:00
Daniel Bichuetti
3009ac2988
feat: Add page range support to PDF converters. ( #3965 )
...
* feat: add start and eng page to PDF converters
* docs: add missing docstrings
* refactor: change list set up, add docstrings and comment
* fix: add missing parameter
* tests: add page range basic test
* tests: test correct page numbers
* tests: remove OCR page range test
*Poppler and Tesseract not installed on CI
* fix: remove mobile change error
2023-01-30 14:09:22 +01:00
Sebastian
71de0524de
fix: fixed InMemoryDocumentStore.get_embedding_count
to return correct number ( #3980 )
...
* Fix the embedding count function of InMemoryDocumentStore
* Adding some doc strings explaining how many docs with embeddings to expect.
2023-01-30 12:38:30 +01:00
hsm207
08ec059b14
refactor: use weaviate client to build BM25 query ( #3939 )
...
* refactor: use weaviate client to build BM25 query
* refactor: remove manual BM25 query building
* refactor: apply BM25 to the content_field only
* test: update weaviate BM25 retrieval test case
update to account for lack of stemming
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-30 10:07:07 +01:00
Tuana Celik
93312138de
fix: removing code block in MarkdownConverter
( #3960 )
...
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
* fix to removing code blocks in markdown converter
* adding a test
* fixing a test
* improving tests
* adding language to code block
2023-01-27 15:25:54 +01:00
Tuana Celik
790e9acd3e
feat: add frontmatter to meta in MarkdownConverter
( #3953 )
...
* first attempt to add frontmatter of markdown to the metadata
* remove bug fix
* running black and pre-commit
* moving the import line
* adding a test
* adding pydoc
2023-01-26 17:15:02 +01:00
Massimiliano Pippi
52b195faf6
increase the timeout for testing ( #3957 )
2023-01-26 16:04:43 +01:00
Vladimir Blagojevic
ec85207cf7
Remove __eq__ and __hash__ from PromptNode ( #3923 )
2023-01-26 13:38:35 +01:00
Vladimir Blagojevic
b945eaeabd
PromptNode: expose output_variable, adjust unit tests ( #3892 )
2023-01-26 11:01:11 +01:00
ZanSara
0e471d5e5a
fix: change model in distillation test ( #3944 )
...
* change model
* change layer count
* move promptnode tests in integration
* fix marker
2023-01-25 23:32:11 +05:30
Mayank Jobanputra
5c53b2bd4a
feat: adding secure loading of models by default for haystack ( #3901 )
...
* adding secure loading of models by default
* simplified set function
* testing import effect correctly
* added appropriate log line, adapted the test
* change log string formatting
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* remove extra closing bracket )
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-01-24 23:01:20 +05:30
Vladimir Blagojevic
4d8b1d0b22
refactor: Improve stop_words handling, add unit test cases ( #3918 )
...
* Improve stop_words handling, add unit test cases
* Update test/nodes/test_prompt_node.py
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-01-24 12:52:41 +01:00
Fabian
61ebe4b5dc
fix: authenticate with aws4auth if set in OpenSearchDocumentStore ( #3741 )
...
* bug(OpenSearchDocumentStore): fix authenticate with aws4auth if set.
Rearrange check to authenticate with aws4auth before username
and password, as the username is set to "admin" by default.
* Make username check less restrictive
* Fix test, do not used mocked _init_client function
* Add warning for aws4auth and username to ElasticSearchDocumentStore
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2023-01-24 10:01:39 +01:00
Zoltan Fedor
e447bd728a
feat: adding the ability to use Ray Serve async functionality ( #3769 )
...
* Adding the ability to call the Ray pipeline from concurrent apps with async
This is to fix #2968
* Fixes: mype + pylint (`invalid-overridden-method`)
* Simplifying - no real need for an `AsyncRayPipeline` anymore
* Moving the new `run_async` method to the `RayPipeline`
* Cleanup
* [EMPTY] Re-trigger CI
2023-01-23 16:23:09 +01:00
Benjamin BERNARD
eed009eddb
feat: Add CsvTextConverter
( #3587 )
...
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fixes #3550 , allow user to build full FAQ using YAML pipeline description and with CSV import and indexing.
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fix linter issues mypy and pylint.
* feat: Add Csv2Documents, EmbedDocuments nodes and FAQ indexing pipeline
Fix linter issues mypy.
* implement proposal's feedback
* tidy up for merge
* use BaseConverter
* use BaseConverter
* pylint
* black
* Revert "black"
This reverts commit e1c45cb1848408bd52a630328750cb67c8eb7110.
* black
* add check for column names
* add check for column names
* add tests
* fix tests
* address lists of paths
* typo
* remove duplicate line
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-01-23 15:56:36 +01:00
ZanSara
94f660c56f
feat: store id_hash_keys
in Document
objects to make documents clonable ( #3697 )
...
* store id_hash_keys in Document objects
* fix id_hash_keys calls throughout codebase
* generate schema
* fix es
* fix weaviate
* backward compatible
* openapi schema
* remove unused deprecation warning
* remove unused imports
* openapi
* unused var
* Apply suggestions from code review
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/schema.py
* Apply suggestions from code review
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Update haystack/schema.py
* review feedback
* trailing spaces
* pylint
* add deprecation test
Co-authored-by: bogdankostic <bogdankostic@web.de>
2023-01-23 15:00:52 +01:00
Stefano Fiorucci
b910df7ec7
feat: ImageToText
(caption generator) ( #3859 )
...
* first draft
* fix pylint and mypy
* retry w mypy
* mypy :-)
* rem unused import
* incorporate feedback and initial tests
* better tests
* fix import order
* fix docstring
* other fix docstring
* more and better tests
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2023-01-23 11:59:56 +01:00
ZanSara
90c877a559
bug: mypy
should ignore files in test/
( #3894 )
...
* exclude files in test/
* verify that the CI ignores test files
* dont fail in case of no files
2023-01-19 18:12:26 +01:00
Vladimir Blagojevic
4c28253955
feat: PromptNode - implement stop words ( #3884 )
2023-01-19 12:26:15 +01:00
Vladimir Blagojevic
e2fb82b148
refactor: Move invocation_context from meta to own pipeline variable ( #3888 )
2023-01-19 11:17:06 +01:00
ZanSara
6f5a2fb1da
fix: remove string validation in YAML ( #3854 )
...
* remove string validation in YAML
* unused import
* fix import
* remove tests
* fix tests
2023-01-19 10:06:53 +01:00
Ahmed Nabil
12e057837b
Adding condition to pinecone
object. ( #3768 )
...
* Adding condition to `pinecone` object.
While you can assign any values to `PineconeDocumentStore`'s parameter `pinecone_index`, it must have another condition to prevent that from happening.
* Added test, and changed the code to make sure the pinecone idx variable has correct instance
* fixed black error
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-01-19 01:34:44 +05:30
ZanSara
6af4f14fe0
feat: preprocessor raises warning when doc length exceeds threshold ( #3837 )
...
* add warning for excessive lenght
* improve test
* review feedback
* fix test
* move into _process_single
2023-01-17 13:48:28 +01:00
ZanSara
9e457db2e9
test: add version deprecation fixture ( #3851 )
...
* add fixture
* Update test/conftest.py
* remove +2 and add tests
* few typos
* more cases
* Update test/conftest.py
2023-01-16 15:36:14 +01:00
ZanSara
3ffdb0a9a3
chore: fix all EOF ( #3852 )
...
* fix all eof
* fix test
* fix test
* fix test
* typo
* fix sample
* fix sample
* add logs
* fix page_dynamic_result.txt
2023-01-16 12:34:50 +01:00
Massimiliano Pippi
fa4404baa0
fix: ignore non-serializable params when hashing pipeline objects ( #3842 )
...
* ignore non-serializable params when hashing pipeline objects
* make tests more clear
2023-01-11 17:11:41 +01:00
Stefano Fiorucci
be31178892
fix: make the crawler runnable and testable on Windows ( #3830 )
...
* fix crawler and try to run CI
* more compact expression
* try to fix
* improve naming regex
* revert regex
* make test_url compatible wirh Windows
* better conditional expression
2023-01-10 20:27:28 +01:00
Tobias Wochinger
dea10a51d3
fix: gracefully handle FileExistsError
during Preprocessor
resource download ( #3816 )
...
* fix: use temp path for downloading punkt resources
* fix: gracefully handle file exists error during download
2023-01-10 11:22:49 +01:00
Zoltan Fedor
0288e1be76
bug: The PromptNode
handles all parameters as lists without checking if they are in fact lists ( #3820 )
2023-01-10 08:08:17 +01:00
tstadel
6ca88bfd23
fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field ( #3662 )
...
* fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field
* fix pylint
* add tests
* fix mypy
* fix merge
* format
* fix pylint
* move tests to SearchEngineDocumentStoreTestAbstract
* move missed constants
* add mocked_document_store fixture to TestElasticsearchDocumentStore
* fix mocked_document_store
* fix get_all_documents tests for elasticsearch>=7.16
* fix tests
* fix tests try 2
2023-01-09 11:58:23 +01:00
Sebastian
5b0b338175
fix: Ensure eval mode for TableReader model for predictions ( #3743 )
...
* Adding model.eval() calls to prediction functions in table reader
* Add unit test to check if model is set in train mode that inference time prediction still works.
2023-01-09 11:07:06 +01:00
Sebastian
659020fcac
fix: Convert table cells to strings for compatibility with TableReader ( #3762 )
...
* Add table = table.astype(str) to make sure cells are converted into to strings to be compatible witht the TableReader
* Turn more strings into ints
* Make sure answer text is always a string.
2023-01-09 10:42:11 +01:00
tstadel
4a0a054164
fix: linefeeds in custom_query ( #3813 )
...
* fix linefeeds in custom_query
* add double quote test case
2023-01-05 17:13:04 +01:00
Julian Risch
0c2d13f1b8
bug: skip validating empty embeddings ( #3774 )
...
* skip validating empty embeddings
* skip batches without embeddings to update
* add unit test with mocked retriever
2023-01-05 15:13:57 +01:00
Sebastian
e84fae2894
Migrating to use native Pytorch AMP ( #2827 )
...
* Started making changes to use native Pytorch AMP
* Updated compute_loss functions to use torch.cuda.amp.autocast
* Updating docstrings
* Add use_amp to trainer_checkpoint
* Removed mentions of apex and started to add the necessary warnings
* Removing unused instances of use_amp variable
* Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train
* Make max_query_length optional in FARMReader.train
* Update lg
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-01-05 09:14:28 +01:00
Julian Risch
a2c160e7d8
bug: skip empty documents in reader ( #3773 )
...
* skip empty documents
* test eval_batch and account for tables
2023-01-03 15:50:14 +01:00
Julian Risch
b155297a06
feat: change PipelineConfigError to DocumentStoreError with more details ( #3783 )
2023-01-02 19:40:45 +01:00
Vladimir Blagojevic
bebd6b26ec
Improve robustness of PromptNode unit tests ( #3747 )
2023-01-02 16:28:56 +01:00
bogdankostic
594d2a10f8
fix: Fix predict_batch
in TransformersReader
for single nested Document list ( #3748 )
...
* Fix restoring of list structure
* Add tests
2022-12-29 11:48:18 +01:00
Stefano Fiorucci
136928714c
refactor: remove deprecated parameters from Summarizer
( #3740 )
...
* remove deprecated parameters
* remove deprecation/removal test
2022-12-29 15:37:47 +05:30
tstadel
6c067b2b4f
feat: make score_script
first class citizen via knn_engine
param ( #3284 )
...
* OpenSearchDocumentStore: make score_script accessible via knn_engine
* blacken
* fix tests
* fix format
* fix naming of 'score_script' consistently
* fix tests
* fix test
* fix ef_search tests
* always validate index
* improve clone_embedding_field
* fix pylint
* reformat
* remove port
* update tests
* set no_implicit_optional = false
* fix myp
* fix test
* refactorings
* reformat
* fix and refactor tests
* better tests
* create search_field mappings
* remove no_implicit_optional = false
* skip validation for custom mapping
* format
* Apply suggestions from docs code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Apply tougher suggestions from code review
* fix messages
* fix typos
* update tests
* Update haystack/document_stores/opensearch.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* fix tests
* fix ef_search validation
* add test for ef_search nmslib
* fix assert_not_called
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-12-27 15:24:31 +01:00
Mayank Jobanputra
76a16807d5
fix: Fixed local reader model loading ( #3663 )
...
* Fixed local loading issue
2022-12-24 03:46:36 +05:30