12 Commits

Author SHA1 Message Date
bogdankostic
5c3bfad078
feat: Add page number to Documents coming from PDFConverters and PreProcessor (#2932)
* Add page number to Documents coming from PDFConverters and PreProcessor

* Fix mypy

* Update API Docs

* Update API Docs

* Remove unused imports

* Generate JSON schema

* Generate JSON schema

* Make test variable shorter

* Make regex a separate function

* Move counting of page breaks to a function

* Generate JSON schema

* Apply suggestions from code review

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

* Update API Documentation

* Don't create instance for testing staticmethod

* Update haystack/nodes/preprocessor/preprocessor.py

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>

Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
2022-08-09 15:55:27 +02:00
Branden Chan
dfeb171686
Add API page for util functions (#2863)
* Clean OpenAIAnswerGenerator docstrings

* Incorporate reviewer feedback

* Update Documentation & Code Style

* Improve id_hash_keys description

* Simplify id_hash_keys description

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-08-09 14:53:45 +02:00
Vladimir Blagojevic
b13c32eb9c
Add GPL API docs, unit tests update (#2634)
* Update test_label_generator.py

* GPL increase default batch size to 16

* GPL - API docs

* GPL - split unit tests

* Make devs aware of multilingual GPL

* Create separate train/save test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-10 05:25:28 -04:00
Massimiliano Pippi
374155fd5c
Move Opensearch document store in its own module (#2603)
* move OpenSearchDocumentStore into its own Python module

* Update Documentation & Code Style

* mark test with (sigh) elasticsearch

* skip opensearch tests on windows

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-08 16:37:23 +02:00
Branden Chan
caf1336424
Adjust pydoc markdown config so methods shown with classes (#2511)
* add_member_class_prefix: true

* Update Documentation & Code Style

* Trigger redeploy

* Trigger redeploy

* Fix pydoc param

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-05-06 16:00:08 +02:00
Sara Zan
f8e02310bf
Validate YAML files without loading the nodes (#2438)
* Remove BasePipeline and make a module for RayPipeline

* Can load pipelines from yaml, plenty of issues left

* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it

* Fix pipeline tests

* Move some tests out of test_pipeline.py and create MockDenseRetriever

* myoy and pylint (silencing too-many-public-methods)

* Fix issue found in some yaml files and in schema files

* Fix paths to YAML and fix some typos in Ray

* Fix eval tests

* Simplify MockDenseRetriever

* Fix Ray test

* Accidentally pushed merge coinflict, fixed

* Typo in schemas

* Typo in _json_schema.py

* Slightly reduce noisyness of version validation warnings

* Fix version logs tests

* Fix version logs tests again

* remove seemingly unused file

* Add check and test to avoid adding the same node to the pipeline twice

* Update Documentation & Code Style

* Revert config to pipeline_config

* Remo0ve unused import

* Complete reverting to pipeline_config

* Some more stray config=

* Update Documentation & Code Style

* Feedback

* Move back other_nodes tests into pipeline tests temporarily

* Update Documentation & Code Style

* Fixing tests

* Update Documentation & Code Style

* Fixing ray and standard pipeline tests

* Rename colliding load() methods in dense retrievers and faiss

* Update Documentation & Code Style

* Fix mypy on ray.py as well

* Add check for no root node

* Fix tests to use load_from_directory and load_index

* Try to workaround the disabled add_node of RayPipeline

* Update Documentation & Code Style

* Fix Ray test

* Fix FAISS tests

* Relax class check in _add_node_to_pipeline_graph

* Update Documentation & Code Style

* Try to fix mypy in ray.py

* unused import

* Try another fix for Ray

* Fix connector tests

* Update Documentation & Code Style

* Fix ray

* Update Documentation & Code Style

* use BaseComponent.load() in pipelines/base.py

* another round of feedback

* stray BaseComponent.load()

* Update Documentation & Code Style

* Fix FAISS tests too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
2022-05-04 17:39:06 +02:00
James Briggs
8cd73a9d20
Add PineconeDocumentStore (#2254)
* added core install and functionality of pinecone doc store (init, upsert, query, delete)

* implemented core functionality of Pinecone doc store

* Update Documentation & Code Style

* updated filtering to use Haystack filtering and reduced default batch_size

* Update Documentation & Code Style

* removed debugging code

* updated Pinecone filtering to use filter_utils

* removed uneeded methods and minor tweaks to current methods

* fixed typing issues

* Update Documentation & Code Style

* Allow filters in al methods except get_embedding_count

* Fix skipping document store tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Fix handling of Milvus1 and Milvus2 in tests

* Update Documentation & Code Style

* Remove SQL from tests requiring embeddings

* Update Documentation & Code Style

* Fix get_embedding_count of Milvus2

* Make sure to start Milvus2 tests with a new collection

* Add pinecone to test suite

* Update Documentation & Code Style

* Fix typing

* Update Documentation & Code Style

* Add pinecone to docstores dependendcy

* Add PineconeDocStore to API Documentation

* Add missing comma

* Update Documentation & Code Style

* Adapt format of doc strings

* Update Documentation & Code Style

* Set API key as environment variable

* Skip Pinecone tests in forks

* Add sleep after deleting index

* Add sleep after deleting index

* Add sleep after creating index

* Add check if index ready

* Remove printing of index stats

* Create new index for each pinecone test

* Use RestAPI instead of Python API for describe_index_stats

* Fix accessing describe_index_stats

* Remove usages of describe_index_stats

* Run pinecone tests separately

* Update Documentation & Code Style

* Add pdftotext to pinecone tests

* Remove sleep from doc store fixture

* Add describe_index_stats

* Remove unused imports

* Use pull_request_target trigger

* Revert use pull_request_target trigger

* Remove set_config

* Add os to conftest

* Integrate review comments

* Set include_values to False

* Remove quotation marks from pinecone.Index type

* Update Documentation & Code Style

* Update Documentation & Code Style

* Fix number of args in error messages

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-03-21 16:24:09 +01:00
Branden Chan
e4c979990c
Bring back init defs to api in v1.2 and latest (#2296)
* Bring back init defs to api in v1.2

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-10 15:01:05 +01:00
agnieszka-m
255226f9fa
Update other.yml with new node names (#2286)
* Update other.yml with new node names

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-08 12:29:49 +01:00
Branden Chan
b563b6622c
Api pages (#2248)
* Update Readme WIP

* Update Documentation & Code Style

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-25 13:53:46 +01:00
MichelBartels
2c423ba063
Introduce support for pymilvus>=2.0.0 (#2126)
* update remaining occurences of get_connection

* fix milvus2 import and fix wrong extra references

* change MilvusDocumentStore to Milvus1DocumentStore

* update milvus docstrings to reflect updated dependency management

* enable milvus 2 tests

* fix milvus2 env variable processing

* fix dropping collections for each milvus 2 test

* make Milvus 2 doc store tests work

* allow user to specify consistency level

* Fist attempt at running Milvus2 in the CI

* Install the correct pymilvus

* add batch deletion for milvus2

* change default from milvus 1 to milvus 2

* make milvus2 the default in the docstores extra

* Switch milvus1 and milvus2 in base test run on CI

* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'

* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py

* Enable autogenerated docs for Milvus1 and 2 separately

* Partial fix to docstring of Milvus2DocumentStore

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-02-24 17:43:38 +01:00
Sara Zan
957e78ed9e
Upgrade pydoc-markdown & refactor GitHub Actions (#2117)
* Upgrade pydoc-markdown and fix the YAMLs to work with it

* Pin pydoc-markdown to major version

* Generalize pydoc-markdown workflow

* Make a single Action to perform all tasks that require committing into the local branch

* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green

* Installing Jupyter deps for Black

* Build cache before running generation tasks

* Add check not to run the code generation on master

* Simplify push action

* Add more test deps in setup.cfg and remove from GH Action workflow

* Remove forced upgrades on pip install

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 15:45:09 +01:00