542 Commits

Author SHA1 Message Date
Sara Zan
2d65c380f1
pre-commit hooks (#2819)
* Add pre-commit config

* update contributing guidelines

* try failing the workflow

* add pre-commit to the deps

* updating uninstall instructions

* separate jobs in CI

* make tutorials check fail

* make black check fail

* make openapi check fail

* make yaml schema and api docs checks fail

* highlight the instructions

* Update .pre-commit-config.yaml

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Update CONTRIBUTING.md

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>

* Use black --check

* Add images of the CI

* title level

* feedback

Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
2022-07-26 15:02:15 +02:00
Sara Zan
5d8476eb58
Restart containers in tutorials.sh (#2858)
* restart tutorials in the loop

* remove container steps in tutorials.yml

* forgotten quotes

* unmatched bracket

* give names to containers

* try to limit the log size

* make the containers restart on the scripts as well

* feedback

* Raise integration tests timeout

* raising limit again
2022-07-25 17:35:36 +02:00
Sara Zan
5119acb260
Raise timeout on integration tests (#2880) 2022-07-25 06:43:20 -04:00
Massimiliano Pippi
8ee2b6b403
Add a custom pydoc renderer for Readme.io (#2825)
* add custom pydoc renderer

* create an example

* revert example code
2022-07-22 10:43:51 +02:00
Sara Zan
48644b23fb
Enable CI on tutorials (#2801)
* enable ci on tutorials

* Disable all path restrictions for safety

* actually comment out the paths block

* remove comment
2022-07-18 17:59:55 +02:00
Sara Zan
6b39fbd39c
Mocking Pinecone tests (#2778)
* Integrating the mock into conftest.py

* re-enable workflow

* delete_all

* Update Documentation & Code Style

* remove ValueError

* Add empty response

* wrong condition

* return response

* revert removal of delete_all

* change mock

* Update Documentation & Code Style

* test for rest api, to revert

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-14 20:03:33 +02:00
Massimiliano Pippi
82df677ebf
API tests (#2738)
* clean up tests and run earlier

* use change detection

* better naming, skip ES

* more cleanup

* fix job name

* dummy commit to trigger the CI

* mock away the PDF converter

* make the test compatible with 3.7

* removed leftover

* always run the api tests, use a matrix for the OS

* refactor all the tests

* remove outdated dependency

* pylint

* new abstract method

* adjust for older python versions

* rename pipeline file

* address PR comments
2022-07-14 15:36:28 +02:00
Sara Zan
091711b8c4
Fix Tutorials and Tutorials (nightly) (#2737)
* Remove caching and install audio deps

* Fix `Tutorials` as well

* Run all tutorials even though some fail

* Forgot fi

* fix failure condition

* proper bash string equality

* Enable debug logs

* remove audio files

* Update Documentation & Code Style

* Use the setup action in the Tutorial CI as well

* Try with a file that exists

* Update Documentation & Code Style

* Fix the comments in the tutorials

* Update Documentation & Code Style

* Fix tutorials.sh

* Remove debug logging

* import pprint and try editable install

* Update Documentation & Code Style

* extract no run list

* Add tutorial18 to no run list nightly

* import pprint correctly

* Update Documentation & Code Style

* try making site-packages editable

* Make pythonpath editable every time Tut17 is run on CI

* typo

* fix imports in tut5

* add git clean

* Update Documentation & Code Style

* add comments and remove` -e`

* accidentally deleted a line

* Update .github/utils/tutorials.sh

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-07-12 11:22:17 +02:00
Malte Pietsch
ba08fc86f5
Add node to use OpenAI's GPT-3 for QA (#2605)
* first draft of openai node for QA

* Update Documentation & Code Style

* fix mypy. add node to inits

* Update Documentation & Code Style

* fix linter

* Adapt OpenAIGenerator to completions endpoint

* Update Documentation & Code Style

* Fix pylint

* Fix doc strings

* Make use of temperature

* Make use of api key in tests

* Adapt doc strings

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-07-08 13:59:27 +02:00
tstadel
2a7c0139f5
double max heap size for elasticsearch in CI (#2756) 2022-07-05 13:53:32 +02:00
Vladimir Blagojevic
ffb7e4e4bd
GPL tutorial - add GPU header and open in colab button (#2736)
* GPL tutorial - add GPU header and open in colab button

* Add GPL tutorial to run exclusion list
2022-07-04 05:23:39 -04:00
Julian Risch
1781e88802
Upgrade torch to 1.12 (#2741)
* Upgrade torch to 1.12

* upgrade torch-scatter

* add explicit torch-scatter installation

* set torch dependency to range >1.9,<1.13
2022-07-01 20:23:32 +02:00
Vladimir Blagojevic
b08c5f81d1
Add GPL adaptation tutorial (#2632)
* Add GPL adaptation tutorial

* Latest round of Aga's corrections

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-26 02:44:57 -04:00
Sara Zan
426f49979b
Change repo with repository in python_cache (#2731)
* Change repo with repository

* remove name

* using owner and name

* use owner name

* replace name with login

* Trying with the PR context instead
2022-06-24 18:36:19 +02:00
Sara Zan
6a7152044e
add repo name as well (#2729) 2022-06-24 17:08:28 +02:00
Sara Zan
13514f960d
Speficy ref in action (#2727) 2022-06-24 15:56:17 +02:00
Sara Zan
400d2cdf77
Fix audio tests on CI (#2718)
* Update Documentation & Code Style

* fix huggingface-hub version

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-24 11:36:31 +02:00
Rob Pasternak
b87c0c950b
Tutorial 14 edit (#2663)
* Rewrite Tutorial 14 for increased user-friendliness

* Update Tutorial14 .py file to match .ipynb file

* Update Documentation & Code Style

* unblock the ci

* ignore error in jitterbit/get-changed-files

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-06-22 13:03:07 +02:00
Sara Zan
505ababf43
Skip Pinecone tests (#2696)
* comment out Pinecone tests block

* Add comment
2022-06-21 14:49:36 +02:00
Massimiliano Pippi
5d255f0e4a
replace question issue with link to discussions (#2697) 2022-06-21 14:10:11 +02:00
Sara Zan
a6c06ee376
Update contributor's checklists in PR template (#2659)
* Split contributor's and reviewer's checklists

* contributor-centric checklist

* Move issues at the top and split entry

* phrasing
2022-06-21 10:11:18 +02:00
Sara Zan
a26c042994
Fix typo in code_and_docs.sh (#2662)
* Fix typo in code_and_docs.sh & install ffmpeg in autoformat.yml

* apt update to get ffmpeg

* Update Documentation & Code Style

* Add header and better error message

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-15 13:50:55 +02:00
Sara Zan
776eba0cd1
Remove pull_request from triggers (#2661) 2022-06-15 10:14:22 +02:00
Sara Zan
584e046642
AnswerToSpeech (#2584)
* Add new audio answer primitives

* Add AnswerToSpeech

* Add dependency group

* Update Documentation & Code Style

* Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives

* Add tests

* Update Documentation & Code Style

* Add ability to compress audio and more tests

* Add audio group to test, all and all-gpu

* fix pylint

* Update Documentation & Code Style

* Accidental git tag

* Try pleasing mypy

* Update Documentation & Code Style

* fix pylint

* Add warning for missing OS library and support in CI

* Try fixing mypy

* Update Documentation & Code Style

* Add docs, simplify args for audio nodes and add tutorials

* Fix mypy

* Fix run_batch

* Feedback on tutorials

* fix mypy and pylint

* Fix mypy again

* Fix mypy yet again

* Fix the ci

* Fix dicts merge and install ffmpeg on CI

* Make the audio nodes import safe

* Trying to increase tolerance in audio test

* Fix import paths

* fix linter

* Update Documentation & Code Style

* Add audio libs in unit tests

* Update _text_to_speech.py

* Update answer_to_speech.py

* Use dedicated dataset & update telemetry

* Remove  and use distilled roberta

* Revert special primitives so that the nodes run in indexing

* Improve tutorials and fix smaller bugs

* Update Documentation & Code Style

* Fix serialization issue

* Update Documentation & Code Style

* Improve tutorial

* Update Documentation & Code Style

* Update _text_to_speech.py

* Minor lg updates

* Minor lg updates to tutorial

* Making indexing work in tutorials

* Update Documentation & Code Style

* Improve docstrings

* Try to use GPU when available

* Update Documentation & Code Style

* Fixi mypy and pylint

* Try to pass the device correctly

* Update Documentation & Code Style

* Use type of device

* use .cpu()

* Improve .ipynb

* update apt index to be able to download libsndfile1

* Fix SpeechDocument.from_dict()

* Change pip URL

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-06-15 10:13:18 +02:00
Sara Zan
735ffa635b
[CI refactoring] Tutorials on CI (#2547)
* Experimental Ci workflow for running tutorials

* Run on every push for now

* Not starting?

* Disabling paths temporarily

* Sort tutorials in natural order

* Install ipython

* remove ipython install

* Try running ipython with sudo

* env.pythonLocation

* Skipping tutorial2 and 9 for speed

* typo

* Use one runner per tutorial, for now

* Typo in dependend job

* Missing quotes broke scripts matrix

* Simplify setup for the tutorials, try to prevent containers conflict

* Remove needless job dependencies

* Try prevent cache issues, fix small Tut10 bug

* Missing deps for running notebook tutorials

* Create three groups of tutorials excluding the longest among them

* remove deps

* use proper bash loop

* Try with a single string

* Fix typo in echo

* Forgot do

* Typo

* Try to make the GraphDB tutorial without launching its own container

* Run notebook and script together

* Whitespace

* separate scrpits and notebooks execution

* Run notebooks first

* Try caching the GoT data before running the scripts

* add note

* fix mkdir

* Fix path

* Update Documentation & Code Style

* missing -r

* Fix folder numbering

* Run notebooks as well

* Typo in notebook command

* complete path in notebook command

* Try with TIKA_LOG_PATH

* Fix folder naming

* Do not use cached data in Tut9

* extracting the number better

* Small tweaks

* Same fix on Tut10 on the notebook

* Exclude GoT cache for tut5 too

* Remove faiss files after tutorial run

* Layout

* fix remove command

* Fix path in tut10 notebook

* Fix typo in node name in tut14

* Third block was too long, rebancing

* Reduce GoT dataset even more, why wasting time after all...

* Fix paths in tut10 again

* do git clean to make sure to cleanup everything (breaks post Python)

* Remove ES file with bad permission at the end of the run

* Split first block, takes >30mins

* take out tut15 for a moment, has an actual bug

* typo

* Forgot rm option

* Simply remove all ES files

* Improve logs of GoT reduction

* Exclude also tut16 from cache to try fix bug

* Replace ll with ls

* Reintroduce 15_TableQA

* Small regrouping

* regrouping to make the min num of runners go for about 30mins

* Add cron schedule and PR paths conditions

* Add some timing information

* Separate tutorials by diff and tutorials by cron

* temp add pull_request to tutorials nightly

* Add badge in README to keep track of the nightly tutorials run

* Remove prefixes from data folder names

* Add fetch depth to get diff with master

* Fix paths again

* typo

* Exclude long-running ones

* Typo

* Fix tutorials.yml as well

* Use head_ref

* Using an action for now

* exclude other files

* Use only the correct command to run the tutorial

* Add long running tutorials in separate runners, just for experiment

* Factor out the complex bash script

* Pass the python path to the bash script

* Fix paths

* adding log statement

* Missing dollarsign

* Resetting variable in loop

* using mini GoT dataset and improving bash script

* change dataset name

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-15 09:53:36 +02:00
Sara Zan
8d7439c623
Move autoformat-check.yml into tests.yml (#2635) 2022-06-10 18:22:16 +02:00
Sara Zan
9968c373d2
make 'ready for review' an event that triggers the tests (#2643) 2022-06-09 09:23:38 +02:00
Sara Zan
c2d2faf31e
Add directive in tests.yml (#2637) 2022-06-07 13:31:19 +02:00
Sara Zan
59608ca474
[CI Refactoring] Workflow refactoring (#2576)
* Unify CI tests (from #2466)

* Update Documentation & Code Style

* Change folder names

* Fix markers list

* Remove marker 'slow', replaced with 'integration'

* Soften children check

* Start ES first so it has time to boot while Python is setup

* Run the full workflow

* Try to make pip upgrade on Windows

* Set KG tests as integration

* Update Documentation & Code Style

* typo

* faster pylint

* Make Pylint use the cache

* filter diff files for pylint

* debug pylint statement

* revert pylint changes

* Remove path from asserted log (fails on Windows)

* Skip preprocessor test on Windows

* Tackling Windows specific failures

* Fix pytest command for windows suites

* Remove \ from command

* Move poppler test into integration

* Skip opensearch test on windows

* Add tolerance in reader sas score for Windows

* Another pytorch approx

* Raise time limit for unit tests :(

* Skip poppler test on Windows CI

* Specify to pull with FF only in docs check

* temporarily run the docs check immediately

* Allow merge commit for now

* Try without fetch depth

* Accelerating test

* Accelerating test

* Add repository and ref alongside fetch-depth

* Separate out code&docs check from tests

* Use setup-python cache

* Delete custom action

* Remove the pull step in the docs check, will find a way to run on bot commits

* Add requirements.txt in .github for caching

* Actually install dependencies

* Change deps group for pylint

* Unclear why the requirements.txt is still required :/

* Fix the code check python setup

* Install all deps for pylint

* Make the autoformat check depend on tests and doc updates workflows

* Try installing dependencies in another order

* Try again to install the deps

* quoting the paths

* Ad back the requirements

* Try again to install rest_api and ui

* Change deps group

* Duplicate haystack install line

* See if the cache is the problem

* Disable also in mypy, who knows

* split the install step

* Split install step everywhere

* Revert "Separate out code&docs check from tests"

This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.

* Add back the action

* Proactive support for audio (see text2speech branch)

* Fix label generator tests

* Remove install of libsndfile1 on win temporarily

* exclude audio tests on win

* install ffmpeg for integration tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-07 09:23:03 +02:00
Sara Zan
89bb1ca139
[CI refactoring] Improve autoformat.yml (#2556)
* Restructure autoformat to run a single script

* Reduce diff for autoforma.yml

* Reduce diff on linux_ci.yml
2022-05-18 20:02:43 +02:00
Julian Risch
70ca1e9fc6
Smaller demo instance type (#2564)
This PR changes the instance type of the public Haystack demo from p3.2xlarge to g4dn.2xlarge.
g4dn.2xlarge has 1 GPU, 8 vCPUs, 32 GiB of memory
p3.2xlarge had 1 GPU, 8 vCPUs, 61 GiB of memory
which results in 75% lower costs with g4dn.2xlarge.

I also tried out the even smaller g4dn.xlarge, which has 1 GPU, 4 vCPUs, 16 GiB of memory. However, the memory was not enough to run the demo. I tried out multiple requests at the same time and it worked well with g4dn.2xlarge. Requests are slightly slower as with the more powerful instance type but it's hard to notice.
2022-05-17 12:47:15 +02:00
Sara Zan
ff4303c51b
[CI refactoring] Categorize tests into folders (#2554)
* Categorize tests into folders

* Fix linux_ci.yml and an import

* Wrong path
2022-05-17 09:55:53 +01:00
Ivan Lopez
a2a99f79b1
Fix docker image tag with semantic version for releases (#2548)
* Fix docker tag with semantic version for releases

* Prepend latest docker tag with tagprefix in cache-from
2022-05-16 13:26:33 +02:00
bogdankostic
300ee1ac83
Upgrade torch version to 1.11 (#2538)
* Bump torch version

* Upgrade torch version in torch-scatter
2022-05-13 14:45:53 +02:00
Sara Zan
15a9ff6f67
PR template mention of enabling Actions (#2523)
* Update version to 1.4.1rc0

* Add hint of enabling action on the fork in the PR template

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-10 09:46:09 +02:00
Sara Zan
f3e0ba4be9
Fix OpenSearchDocumentStore's __init__ (#2498)
* Move super in OpenSearchDocumentStore and add small test

* Update Documentation & Code Style

* Add Opensearch container to the CI

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-05 10:38:09 +02:00
tstadel
7498c7c6fb
Fix and use delete_index instead of delete_documents in tests (#2453)
* use delete_index instead of delete_documents in tests

* fix delete_index

* fix  delete_index() in memory and milvus

* fix imports

* fix memory keyerrors

* Update Documentation & Code Style

* increase timeout for pinecone tests to 60 minutes

* clean get_document_store()

* use recreate_index in tests

* Update Documentation & Code Style

* fix tests

* fix remaining tests

* log index deleted

* fix test_eval_pipeline

* simplify existing index detection in weaviate

* delete label_index on recreate_index for pinecone and milvus

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-26 19:06:30 +02:00
Sara Zan
4eec2dc45e
Change YAML version exception into a warning (#2385)
* Change exception into warning, add strict_version param, and remove compatibility between schemas

* Simplify update_json_schema

* Rename unstable into master

* Prevent validate_config from changing the config to validate

* Fix version validation and add tests

* Rename master into ignore

* Complete parameter rename

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 16:08:08 +02:00
Sara Zan
8abf11fbd3
Update pdftotext also on pinecone and milvus1 CI jobs (#2433)
* Upgrade pdftotext also on pinecone and milvus1 jobs

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 16:06:27 +02:00
Sara Zan
ba9c976bfe
Update pdftotext link (#2432)
* Update pdftotext link

* Update Documentation & Code Style

* Update Tutorial8_Preprocessing.ipynb

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-19 14:30:18 +02:00
Sara Zan
1a81080e8a
Add apt update in Linux CI (#2415)
* Update linux_ci.yml
2022-04-13 15:35:56 +02:00
Sara Zan
96a538b182
Pylint (import related warnings) and REST API improvements (#2326)
* remove duplicate imports

* fix ungrouped-imports

* Fix wrong-import-position

* Fix unused-import

* pyproject.toml

* Working on wrong-import-order

* Solve wrong-import-order

* fix Pool import

* Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py

* remove Converter from modeling

* Fix mypy issues on adaptive_model.py

* create es_converter.py

* remove converter import

* change import path in tests

* Restructure REST API to not rely on global vars from search.apy and improve tests

* Fix openapi generator

* Move variable initialization

* Change type of FilterRequest.filters

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-04-12 16:41:05 +02:00
tstadel
ab8ba75664
Set ci job timeout to 45 minutes (#2401) 2022-04-11 16:28:26 +02:00
Sara Zan
ae712fe6bf
Upgrade weaviate-client to 3.3.3 and fix get_all_documents (#1895)
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents

* Add type

* Update Weaviate version on the CI

* Fix bug on get_document_count where there are no documents

* Add more info in the docstrings of get_all_documents and get_all_documents_generator

* Add latest docstring and tutorial changes

* Apply Black

* Update Documentation & Code Style

* Trigger pipeline

* Update Documentation & Code Style

* Include StefanBogdan feedback

* Fix mypy issues and LogicalFilterClause

* Add more types

* Update Documentation & Code Style

* update setup.cfg

* Upgrade weaviate containers too

* Allow to filter for content field in Weaviate

* Use convert_to_weaviate instead of convert_to_pinecone

* Fix _get_all_documents_in_index

* Update docstrings and docs

* Catching an exception in get_document(s)_by_id

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-04-01 15:37:34 +03:00
bogdankostic
834f8c4902
Change return types of indexing pipeline nodes (#2342)
* Change return types of file converters

* Change return types of preprocessor

* Change return types of crawler

* Adapt utils to functions to new return types

* Adapt __init__.py to new method names

* Prevent circular imports

* Update Documentation & Code Style

* Let DocStores' run method accept Documents

* Adapt tests to new return types

* Update Documentation & Code Style

* Put "# type: ignore" to right place

* Remove id_hash_keys property from Document primitive

* Update Documentation & Code Style

* Adapt tests to new return types and missing id_hash_keys property

* Fix mypy

* Fix mypy

* Adapt PDFToTextOCRConverter

* Remove id_hash_keys from RestAPI tests

* Update Documentation & Code Style

* Rename tests

* Remove redundant setting of content_type="text"

* Add DeprecationWarning

* Add id_hash_keys to elasticsearch_index_to_document_store

* Change document type from dict to Docuemnt in PreProcessor test

* Fix file path in Tutorial 5

* Remove added output in Tutorial 5

* Update Documentation & Code Style

* Fix file_paths in Tutorial 9 + fix gz files in fetch_archive_from_http

* Adapt tutorials to new return types

* Adapt tutorial 14 to new return types

* Update Documentation & Code Style

* Change assertions to HaystackErrors

* Import HaystackError correctly

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-29 13:53:35 +02:00
Julian Risch
bf71f03ff2
release v1.3.0 and re-add Makefile (#2354)
* release v1.3.0 and re-add Makefile

* Update Documentation & Code Style

* make BaseKnowledgeGraph abstract to remove it from the JSON schema

* Logging paths for JSON schema generation

* Add debug command in autoforma.yml

* Typo

* Update Documentation & Code Style

* Fix schema path in CI

* Update Documentation & Code Style

* Remove debug statement from autoformat.yml

* Reintroduce compatibility between 1.3.0 and 1.2.1rc0 schema

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-03-23 17:22:06 +01:00
bogdankostic
7e6ff8a205
Run Pinecone tests only if files related to Pinecone changed (#2343)
* Run Pinecone tests only if files related to Pinecone changed

* Change in pinecone.py that will be reverted

* Revert change in pinecone.py

* Test Pinecone also when filter_utils.py changes
2022-03-22 15:58:12 +01:00
Julian Risch
98fa48cc4c
rename label: ignore-for-release-notes (#2320)
* rename label: ignore-for-release-notes

* add document store topics
2022-03-21 17:26:09 +01:00
James Briggs
8cd73a9d20
Add PineconeDocumentStore (#2254)
* added core install and functionality of pinecone doc store (init, upsert, query, delete)

* implemented core functionality of Pinecone doc store

* Update Documentation & Code Style

* updated filtering to use Haystack filtering and reduced default batch_size

* Update Documentation & Code Style

* removed debugging code

* updated Pinecone filtering to use filter_utils

* removed uneeded methods and minor tweaks to current methods

* fixed typing issues

* Update Documentation & Code Style

* Allow filters in al methods except get_embedding_count

* Fix skipping document store tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Remove SQL from tests requiring embeddings

* Update Documentation & Code Style

* Fix get_embedding_count of Milvus2

* Make sure to start Milvus2 tests with a new collection

* Add pinecone to test suite

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* Add pinecone to docstores dependendcy

* Add PineconeDocStore to API Documentation

* Add missing comma

* Update Documentation & Code Style

* Adapt format of doc strings

* Update Documentation & Code Style

* Set API key as environment variable

* Skip Pinecone tests in forks

* Add sleep after deleting index

* Add sleep after deleting index

* Add sleep after creating index

* Add check if index ready

* Remove printing of index stats

* Create new index for each pinecone test

* Use RestAPI instead of Python API for describe_index_stats

* Fix accessing describe_index_stats

* Remove usages of describe_index_stats

* Run pinecone tests separately

* Update Documentation & Code Style

* Add pdftotext to pinecone tests

* Remove sleep from doc store fixture

* Add describe_index_stats

* Remove unused imports

* Use pull_request_target trigger

* Revert use pull_request_target trigger

* Remove set_config

* Add os to conftest

* Integrate review comments

* Set include_values to False

* Remove quotation marks from pinecone.Index type

* Update Documentation & Code Style

* Update Documentation & Code Style

* Fix number of args in error messages

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-03-21 16:24:09 +01:00
Julian Risch
ac5617e757
Add basic telemetry features (#2314)
* add basic telemetry features

* change pipeline_config to _component_config

* Update Documentation & Code Style

* add super().__init__() calls to error classes

* make posthog mock work with python 3.7

* Update Documentation & Code Style

* update link to docs web page

* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)

* add comment on send_event in BaseComponent.init() and fix mypy

* mock NonPrivateParameters and fix pylint undefined-variable

* Update Documentation & Code Style

* check model path contains multiple /

* add test for writing to file

* add test for en-/disable telemetry

* Update Documentation & Code Style

* merge file deletion methods and ignore pylint global statement

* Update Documentation & Code Style

* set env variable in demo to activate telemetry

* fix mock of HAYSTACK_TELEMETRY_ENABLED

* fix mypy and linter

* add CI as env variable to execution contexts

* remove threading, add test for custom error event

* Update Documentation & Code Style

* simplify config/log file deletion

* add test for final event being sent

* force writing config file in test

* make test compatible with python 3.7

* switch to posthog production server

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 11:58:51 +01:00