1977 Commits

Author SHA1 Message Date
Silvano Cerza
18e83b3ed4
Pin requests-cache test dependency to <1.0.0 (#4325) 2023-03-03 12:47:15 +01:00
bogdankostic
f33829fabf
Remove xpdf dependencies (#4314) 2023-03-02 11:12:03 +01:00
Vladimir Blagojevic
79bf25aaea
feat: Add Azure as OpenAI endpoint (#4170)
* Add Azure as OpenAI endpoint
---------

Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-03-02 09:55:09 +01:00
Daniel Bichuetti
7c49fffc71
feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation (#4226)
* refactor: isolate PDF converters

* refactor: remove xpdf dependency and fix tests

* refactor: add min. version

* feat: enable multiprocessing and add tests

* fix: remove unused imports

* fix: regression when moved code

* refactor: use itertools

* fix: mypy claims

* refactor: double tool support

* refactor: add fallback to xpdf

* refactor: black formatting

* refactor: make superclass signature compatible

* refactor: complete removal of xPdf

* refactor: regroup Haystack imports and fix regression

* refactor: remove original declaration

* docs: fix docstrings

* tests: add [pdf] to [all]

* refactor: remove redundant checks, avoid extra processes

* refactor: add deprecation warning

* refactor: add pytest mark

* tests: change PDF test file

* fix: correct pytest mark

* refactor: deprecate parameter and add new

* tests: change pdf sample

* Add minor lg changes to docstrings

* Fix default value in doc strings

* Update test/nodes/test_file_converter.py

Co-authored-by: bogdankostic <bogdankostic@web.de>

* tests: fix page count

* refactor: add imported function

* refactor: change default value

* tests: change parameters and fix typo

* Unify sort_by_position parameter names

---------

Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-03-01 22:34:38 +01:00
Silvano Cerza
90da7bf4f8
Fix docstring-labeler.yml workflow (#4307) 2023-03-01 17:49:04 +01:00
ZanSara
ae04ce3c6a
test: mock all Summarizer tests and move a few into e2e (#4299)
* stub e2e folders

* simplify pipeline test

* mocking

* unit tests fixed

* clean up e2e

* pipeline tests work

* pylint

* leftover

* small fix from #2994 and additional tests

* review feedback

* change summaries

* black

* revert models and summaries
2023-03-01 17:30:55 +01:00
bogdankostic
583d2d8244
Fix search path for Shaper API docs (#4306) 2023-03-01 16:10:39 +01:00
ZanSara
165a0a5faa
test: mock all Translator tests and move one to e2e (#4290)
* mock all translator tests and move one to e2e

* typo

* extract pipeline tests using translator

* remove duplicate test

* move generator test in e2e

* Update e2e/pipelines/test_extractive_qa.py

* pytest.mark.unit

* black

* remove model name as well

* remove unused fixture

* rename original and improve pipeline tests

* fixes

* pylint
2023-03-01 14:52:05 +01:00
Agnieszka Marzec
7e0f9715ba
Docs: Add shaper API (#4288)
* Add shaper and update category id

* Fix the category id

* Update category
2023-03-01 14:02:47 +01:00
Stefano Fiorucci
e8f9b1b65d
test: replace ElasticsearchDS with InMemoryDS when it makes sense; support scale_score in InMemoryDS (#4283)
* replace elasticds with imds - first draft

* fix

* fix tests and implement scale_score in imds bm25

* add docstrings for scale_score
2023-03-01 11:35:10 +01:00
Silvano Cerza
ee74421212
ci: Refactor docs config and generation (#4280)
* Change docs yml category config

* Update docs renderers to fetch categories from Readme.io

* Update readme_sync.yml to handle new docs rendering

* Remove unecessary script and related workflow step

* Fix sys.exits
2023-03-01 09:51:02 +01:00
Silvano Cerza
6e241262ad
ci: Change docker_release.yml workflow to run after successful PyPi release (#4293)
* Change docker_release.yml workflow to run after successful PyPi release

* Add warning on name change in pypi_release.yml
2023-03-01 09:50:47 +01:00
tstadel
d1c9407a25
fix opensearch delete_index (#4295) 2023-03-01 08:40:38 +01:00
Malte Pietsch
2a1d73e16d
refactor: Make extraction of "Tool" and "Tool input" for Agent more robust and user-friendly (#4269)
* adjust [] in prompt template. Add error+docs for Tool name.

* fix test

* update error message
2023-02-28 20:01:34 +01:00
Massimiliano Pippi
c3a38a59c0
Update test_prompt_node.py (#4281) 2023-02-28 09:37:40 +01:00
Julian Risch
662441a62b
fix: FARMReader produces Answers with negative start and end position (#4248) 2023-02-28 09:27:42 +01:00
Sebastian
040d806b42
test: Added integration test for using EntityExtractor in query pipeline (#4117)
* Added new test for using EntityExtractor in query node and made some fixtures to reduce code duplication.

* Reuse ner_node fixture

* Added pytest unit markings and swapped over to in memory doc store.

* Change to integration tests
2023-02-28 09:20:44 +01:00
Silvano Cerza
5678bb6375
Parallellize Docker build job (#4268) 2023-02-27 16:03:24 +01:00
Massimiliano Pippi
4b8d195288
refact: mark unit tests under the test/nodes/** path (#4235)
* document merger

* mark unit tests

* revert
2023-02-27 15:00:19 +01:00
Sebastian
efe46b1214
Fix: Allow torch_dtype="auto" in PromptNode (#4166)
* Fix for allowing torch_dtype="auto"

* Fix to logic of torch_dtype detection

* separate test for dtype
2023-02-27 09:59:27 +01:00
Silvano Cerza
4a93517eb4
test: Fix deprecation fixture (#4219)
* Fix deprecation fixture

* Update docstring

* Update docstring

---------

Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
2023-02-27 09:55:03 +01:00
Kshitij Pawar
3d3e9c9b32
Fix: Issue of failure to initialize input_converter in Seq2SeqGenerator when model_file_path is given as folder path on local disk after manual model download (#4213)
* test

* test documentation commit:

* added original return statement for linting

* removed empty lines

* formatted code using black

* made changes based on suggestions
2023-02-26 18:13:26 +01:00
Silvano Cerza
2c9e4c5ff9
Remove unnecessary operations in minor_version_release.yml (#4267) 2023-02-24 14:29:42 +01:00
Silvano Cerza
280414e5c6
Fix OpenAPI specs upload (#4266) 2023-02-24 10:50:59 +01:00
ZanSara
13c4ff1b52
refactor: remove direct logging without a logger (#4253)
* remove direct logging without a logger

* add custom pylint checker

* add test

* pylint

* improve checker message

* mypy

* remove test

* add checker for basicConfig

* more logging missed

* ignore basicConfig

* move out logger

* move out statement

* remove logging configuration
2023-02-23 20:42:42 +01:00
Vladimir Blagojevic
4b189c0b40
proposal: Implement Agent demo (#4085)
* Agent demo proposal

* Replace on-the-fly module with WebRetriever

* Update proposal with ideas from discussion with Julian

* Replace SerpAPI references with SearchEngine

* Add Agent memory

* Update Agent memory
2023-02-23 19:56:38 +01:00
Silvano Cerza
d594ab800b
ci: Fix OpenAPI spec sync (#4254)
* Attempt to fix OpenAPI sync

* Dry run

* Add step to get OpenAPI specs id

* Remove dryRun and branch trigger
2023-02-23 19:02:46 +01:00
ZanSara
c0c09f1287
Fix typo in google.colab package detection (#4238) 2023-02-23 17:53:23 +01:00
Stefano Fiorucci
5e85f33bd3
refactor: Remove deprecated nodes EvalDocuments and EvalAnswers (#4194)
* remove deprecated classed and update test

* remove deprecated classed and update test

* remove unused code

* remove unused import

* remove empty evaluator node

* unused import :-)

* move sas to metrics
2023-02-23 15:26:17 +01:00
Massimiliano Pippi
722dead1b2
fix agents tests (#4237) 2023-02-23 13:03:45 +01:00
ZanSara
b193e08a64
set env var (#4239)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-23 11:59:46 +01:00
Silvano Cerza
c3bf62d4b0
Add a simple way to skip required tests checks (#4245) 2023-02-23 11:00:20 +01:00
Massimiliano Pippi
764eaa035f
skip summarizer tests to reduce pressure (#4241) 2023-02-23 09:50:24 +01:00
Massimiliano Pippi
dd37b4c29f
fix: apply black formatting (#4240)
* fix black formatting

* try
2023-02-23 08:59:40 +01:00
Agnieszka Marzec
1dc7f6215e
Update top_k description (#4224) 2023-02-22 23:05:41 +02:00
Silvano Cerza
b6371c95a8
Add missing dependencies in openapi upload workflow (#4236) 2023-02-22 19:34:22 +01:00
ZanSara
f816efa50c
feat: reduce and focus telemetry (#4087)
* simplified telemetry and docker containers detection

* pylint

* mypy

* mypy

* Add new credentials and metadata

* remove prints

* mypy

* remove comment

* simplify inout len measurement

* black

* removed old telemetry, to revert

* reintroduce env function

* reintroduce old telemetry

* fix telemetry selection

* telemetry for promptnode

* telemetry for some training methods

* telemetry for eval and distillation

* mypy & pylint

* review

* Update lg

* mypy

* improve docstrings

* pylint

* mypy

* fix test

* linting

* remove old tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-22 19:02:47 +01:00
Silvano Cerza
181e5474e8
ci: Automate OpenAPI specs upload to Readme.io (#4228)
* Remove OpenAPI specs file

* OpenAPI specs are now automatically uploaded when necessary

* Rename openapi workflow
2023-02-22 18:01:18 +01:00
Daniel Bichuetti
e0b0fe1bc3
feat!: Increase Crawler standardization regarding Pipelines (#4122)
* feat!(Crawler): Integrate Crawler in the Pipeline.

+Output Documents
+Optional file saving
+Optional Document meta about file path

* refactor: add Optional decl.

* chore: dummy commit

* chore: dummy commit

* refactor: improve overwrite flow

* refactor: change custom file path meta logic + add test

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-22 17:34:19 +01:00
Balamurugan Periyasamy
49ed21b82d
fix: Better error messages for OCR requirement (#3767) (#3900)
Add pip install requirement in the error message for missing depency.
2023-02-22 14:57:28 +01:00
tstadel
32b2abf9d5
fix: add option to not override results by Shaper (#4231)
* add  option to shaper and support answers

* remove publish restrictions on outputs

* support list
2023-02-22 14:36:58 +01:00
Massimiliano Pippi
262c9771f4
relax test assertion (#4229) 2023-02-22 12:37:09 +01:00
Daniel Bichuetti
1e4ef24ae9
refactor: isolate PDF converters (#4193) 2023-02-22 08:50:18 +01:00
Massimiliano Pippi
40f772a9b0
refact: move the first batch of unit tests into the proper job (#4216)
* move the first batch of unit tests into the proper job

* leftover
2023-02-21 17:00:02 +01:00
Silvano Cerza
87a02d9372
Fix Dockerfile.base failing cause of missing dependencies (#4215) 2023-02-21 16:37:33 +01:00
Julian Risch
5ce7a404ac
feat: Add Agent (#4148)
* initial Agent implementation

* mypy and pylint fixes

* add missing ABC import

* improved prompt template

* refactor and shorten run method

* refactor and shorten run method

* add tests for extracting

* fix mixed up tool_input/observation & make tests more robust

* fix bug with max_iterations and update prompt template

* allow setting prompt_template in Agent init

* remove example yml for agent

* add final prediction to transcript

* add transcript to errors and accept PromptTemplate in init

* simplify if else to elif

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* add checks for max_iter<2 and empty list returned by prompt node

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-02-21 14:27:40 +01:00
Sebastian
bde01cbf1f
Checking if output keys and output_values are same length and fix bug in storing output keys (#4223) 2023-02-21 13:36:15 +01:00
Sebastian
2bedb80ba5
Fix for custom template in OpenAIAnswerGenerator (#4220) 2023-02-21 13:35:17 +01:00
Mayank Jobanputra
c4b98fcccc
allowing file-upload api to work with write permission (#4221) 2023-02-21 16:48:02 +05:30
Bijay Gurung
d4b822646e
feat: Add JsonConverter node (#4130)
* Add JsonConverter node

* Update language

* JsonConverter: Remove id_hash_keys overwrite when it's None

Also, changes in docstring based on review

* Update docstring for JsonConverter

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-02-21 09:23:42 +01:00