3803 Commits

Author SHA1 Message Date
ZanSara
13c4ff1b52
refactor: remove direct logging without a logger (#4253)
* remove direct logging without a logger

* add custom pylint checker

* add test

* pylint

* improve checker message

* mypy

* remove test

* add checker for basicConfig

* more logging missed

* ignore basicConfig

* move out logger

* move out statement

* remove logging configuration
2023-02-23 20:42:42 +01:00
Vladimir Blagojevic
4b189c0b40
proposal: Implement Agent demo (#4085)
* Agent demo proposal

* Replace on-the-fly module with WebRetriever

* Update proposal with ideas from discussion with Julian

* Replace SerpAPI references with SearchEngine

* Add Agent memory

* Update Agent memory
2023-02-23 19:56:38 +01:00
Silvano Cerza
d594ab800b
ci: Fix OpenAPI spec sync (#4254)
* Attempt to fix OpenAPI sync

* Dry run

* Add step to get OpenAPI specs id

* Remove dryRun and branch trigger
2023-02-23 19:02:46 +01:00
ZanSara
c0c09f1287
Fix typo in google.colab package detection (#4238) 2023-02-23 17:53:23 +01:00
Stefano Fiorucci
5e85f33bd3
refactor: Remove deprecated nodes EvalDocuments and EvalAnswers (#4194)
* remove deprecated classed and update test

* remove deprecated classed and update test

* remove unused code

* remove unused import

* remove empty evaluator node

* unused import :-)

* move sas to metrics
2023-02-23 15:26:17 +01:00
Massimiliano Pippi
722dead1b2
fix agents tests (#4237) 2023-02-23 13:03:45 +01:00
ZanSara
b193e08a64
set env var (#4239)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-23 11:59:46 +01:00
Silvano Cerza
c3bf62d4b0
Add a simple way to skip required tests checks (#4245) 2023-02-23 11:00:20 +01:00
Massimiliano Pippi
764eaa035f
skip summarizer tests to reduce pressure (#4241) 2023-02-23 09:50:24 +01:00
Massimiliano Pippi
dd37b4c29f
fix: apply black formatting (#4240)
* fix black formatting

* try
2023-02-23 08:59:40 +01:00
Agnieszka Marzec
1dc7f6215e
Update top_k description (#4224) 2023-02-22 23:05:41 +02:00
Silvano Cerza
b6371c95a8
Add missing dependencies in openapi upload workflow (#4236) 2023-02-22 19:34:22 +01:00
ZanSara
f816efa50c
feat: reduce and focus telemetry (#4087)
* simplified telemetry and docker containers detection

* pylint

* mypy

* mypy

* Add new credentials and metadata

* remove prints

* mypy

* remove comment

* simplify inout len measurement

* black

* removed old telemetry, to revert

* reintroduce env function

* reintroduce old telemetry

* fix telemetry selection

* telemetry for promptnode

* telemetry for some training methods

* telemetry for eval and distillation

* mypy & pylint

* review

* Update lg

* mypy

* improve docstrings

* pylint

* mypy

* fix test

* linting

* remove old tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-22 19:02:47 +01:00
Silvano Cerza
181e5474e8
ci: Automate OpenAPI specs upload to Readme.io (#4228)
* Remove OpenAPI specs file

* OpenAPI specs are now automatically uploaded when necessary

* Rename openapi workflow
2023-02-22 18:01:18 +01:00
Daniel Bichuetti
e0b0fe1bc3
feat!: Increase Crawler standardization regarding Pipelines (#4122)
* feat!(Crawler): Integrate Crawler in the Pipeline.

+Output Documents
+Optional file saving
+Optional Document meta about file path

* refactor: add Optional decl.

* chore: dummy commit

* chore: dummy commit

* refactor: improve overwrite flow

* refactor: change custom file path meta logic + add test

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

* Update haystack/nodes/connector/crawler.py

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-22 17:34:19 +01:00
Balamurugan Periyasamy
49ed21b82d
fix: Better error messages for OCR requirement (#3767) (#3900)
Add pip install requirement in the error message for missing depency.
2023-02-22 14:57:28 +01:00
tstadel
32b2abf9d5
fix: add option to not override results by Shaper (#4231)
* add  option to shaper and support answers

* remove publish restrictions on outputs

* support list
2023-02-22 14:36:58 +01:00
Massimiliano Pippi
262c9771f4
relax test assertion (#4229) 2023-02-22 12:37:09 +01:00
Daniel Bichuetti
1e4ef24ae9
refactor: isolate PDF converters (#4193) 2023-02-22 08:50:18 +01:00
Massimiliano Pippi
40f772a9b0
refact: move the first batch of unit tests into the proper job (#4216)
* move the first batch of unit tests into the proper job

* leftover
2023-02-21 17:00:02 +01:00
Silvano Cerza
87a02d9372
Fix Dockerfile.base failing cause of missing dependencies (#4215) 2023-02-21 16:37:33 +01:00
Julian Risch
5ce7a404ac
feat: Add Agent (#4148)
* initial Agent implementation

* mypy and pylint fixes

* add missing ABC import

* improved prompt template

* refactor and shorten run method

* refactor and shorten run method

* add tests for extracting

* fix mixed up tool_input/observation & make tests more robust

* fix bug with max_iterations and update prompt template

* allow setting prompt_template in Agent init

* remove example yml for agent

* add final prediction to transcript

* add transcript to errors and accept PromptTemplate in init

* simplify if else to elif

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* add checks for max_iter<2 and empty list returned by prompt node

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2023-02-21 14:27:40 +01:00
Sebastian
bde01cbf1f
Checking if output keys and output_values are same length and fix bug in storing output keys (#4223) 2023-02-21 13:36:15 +01:00
Sebastian
2bedb80ba5
Fix for custom template in OpenAIAnswerGenerator (#4220) 2023-02-21 13:35:17 +01:00
Mayank Jobanputra
c4b98fcccc
allowing file-upload api to work with write permission (#4221) 2023-02-21 16:48:02 +05:30
Bijay Gurung
d4b822646e
feat: Add JsonConverter node (#4130)
* Add JsonConverter node

* Update language

* JsonConverter: Remove id_hash_keys overwrite when it's None

Also, changes in docstring based on review

* Update docstring for JsonConverter

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
Co-authored-by: Sebastian Lee <sebastian.lee@deepset.ai>
2023-02-21 09:23:42 +01:00
Silvano Cerza
f5b8835e2c
ci: Fix Dockerfile.base failing cause of missing git (#4210) 2023-02-20 18:40:30 +01:00
Silvano Cerza
e6af353530
ci: Add ca-certificates installation to xpdf container (#4206) 2023-02-20 17:47:10 +01:00
abwiersma
7aae4293d7
Check cuda availability before calling (#4174)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-20 17:37:56 +01:00
bogdankostic
18e7b8399b
refactor: Remove id_hash_keys parameter in from_dict method (#4207)
* Remove id_hash_keys parameter in from_dict method

* Remove unused import

* Adapt `from_dict` of `SpeechDocument`

* Revert "Adapt `from_dict` of `SpeechDocument`"

This reverts commit 309cbeb7fbb3094c43be76d9e431db9391913144.

* Adapt `from_dict` of `SpeechDocument`
2023-02-20 17:37:35 +01:00
Silvano Cerza
30cdb81f19
ci: Move xpdf build into separate container (#4199)
* Create Dockerfile and hcl config to build Xpdf

* Create workflow to build Xpdf Docker image

* Update Dockerfile.base to not build Xpdf

* Fix CWD removal and arg casing

* Fix ARG setting
2023-02-20 14:58:11 +01:00
github-actions[bot]
aaa1522c45
Update unstable version and openapi schema (#4205)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2023-02-20 14:57:45 +01:00
tstadel
14578aa54f
feat: add top_k to PromptNode (#4159)
* add top_k to PromptNode

* fix OpenAI

* fix openai test
2023-02-20 14:51:45 +01:00
Sebastian
d129598203
Prompt node/run batch (#4072)
* Starting to implement first pass at run_batch

* Started to add _flatten_input function

* First pass at run_batch method.

* Fixed bug

* Adding tests for run_batch

* Update doc strings

* Pylint and mypy

* Pylint

* Fixing mypy

* Restructurig of run_batch tests

* Add minor lg updates

* Adding more tests

* Update dev comments and call static method differently

* Fixed the setting of output variable

* Set output_variable in __init__ of PromptNode

* Make a one-liner

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-20 11:58:13 +01:00
Massimiliano Pippi
83d615a32b
feat: include testing facilities into haystack package (#4182) 2023-02-17 19:38:03 +01:00
Sebastian
44509cd6a1
feat: Add OpenAIError to retry mechanism (#4178)
* Add OpenAIError to retry mechanism. Use env variable for timeout for OpenAI request in PromptNode.

* Updated retry in OpenAI embedding encoder as well.

* Empty commit
2023-02-17 13:17:44 +01:00
bogdankostic
7eeb3e07bf
feat: Add IVF and Product Quantization support for OpenSearchDocumentStore (#3850)
* Add IVF and Product Quantization support for OpenSearchDocumentStore

* Remove unused import statement

* Fix mypy

* Adapt doc strings and error messages to account for PQ

* Adapt validation of indices

* Adapt existing tests

* Fix pylint

* Add tests

* Update lg

* Adapt based on PR review comments

* Fix Pylint

* Adapt based on PR review

* Add request_timeout

* Adapt based on PR review

* Adapt based on PR review

* Adapt tests

* Pin tenacity

* Unpin tenacity

* Adapt based on PR comments

* Add match to tests

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-17 10:28:36 +01:00
Tuana Celik
8370715e7c
chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used (#4155)
* removing old dataset telemetry events

* changing function name

* adding the datasets back for old tutorials

* fixing mini bug

* resolving cometns

* quick bug fix

* re-adding docstrings

* removing unnecessay import

* re-adding the telemetry event call for datasets

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-17 00:21:46 +01:00
tstadel
e7bb2487eb
make all OpenAI API params controllable via model_kwargs (#4183)
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2023-02-16 19:56:08 +01:00
Daniel Bichuetti
9f5a3344d5
fix: Windows amd64 platform repr (#4175) 2023-02-16 19:46:34 +01:00
Tuana Celik
cdb05f0f9a
chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option (#4135)
* fix to include the PromptTemplate object as an option

* small fix
2023-02-16 19:05:04 +01:00
Silvano Cerza
a4407f8f98
Use larger runner for Docker release workflow (#4185) 2023-02-16 18:59:13 +01:00
bogdankostic
fe650b2a3a
fix: Remove logging statement of setting ID manually in Document (#4129)
* Remove logging statement

* update lg

---------

Co-authored-by: agnieszka-m <amarzec13@gmail.com>
2023-02-16 18:58:21 +01:00
Daniel Bichuetti
5187cc1801
refactor: Remove the pin from the espnet module and fix the audio node tests. (#4128)
* fix: fix audio tests + unbound some dependencies

* fix: update for Python 3.8

* refactor: change numpy assertion

* feat: add voice recog. support on audio tests

* fix: fix var assignement

* chore: dummy commit

* fix: fix sndfile error

* refactor: change skip reason

* refactor: hardcode variable

* refactor: unpin numpy

* fix: pin numpy only for audio
2023-02-16 22:12:17 +05:30
Agnieszka Marzec
e7c32da8d7
Fix code block formatting (#4162) 2023-02-16 16:55:41 +01:00
Agnieszka Marzec
e16f1c8935
Docs: Add filter to hide entity post processor (#4160)
* Add filter to hide entity post processor

* Add missing space
2023-02-16 16:40:42 +01:00
Silvano Cerza
689f2cd250
Update docstring-labeler.yml workflow to safely run in PRs from forks (#4146) 2023-02-16 16:02:41 +01:00
Mayank Jobanputra
d27f372b67
build: cache nltk models into the docker image (#4118)
* separated nltk cache

* separated nltk caching

* fixed pylint lazy log error

* using model name as default value
2023-02-16 16:56:16 +05:30
Massimiliano Pippi
ec72dd73fc
refactor: complete the document stores test refactoring (#4125)
* add e2e tests

* move tests to their own module

* add e2e workflow

* pylint

* remove from job

* fix index field name

* skip test on sql

* removed unused code

* fix embedding tests

* adjust test for pinecone

* adjust assertions to the new documents

* bad copypasta

* test

* fix tests

* fix tests

* fix test

* fix tests

* pylint

* update milvus version

* remove debug

* move graphdb tests under e2e
2023-02-16 09:43:25 +01:00
Sebastian
9a26942952
feat: Add model_kwargs option to PromptNode (#4151)
* Add input option to PromptNode to allow the passing of default kwargs

* Add yaml test for model_kwargs parameter
2023-02-15 18:46:26 +01:00