tstadel 14515a861b
Tutorial for DocumentClassifier at Index Time (#1697)
* basic example of document classifier in preprocessing logic

* add batch_size to TransformersDocumentClassifier

* complete tutorial16

* Add latest docstring and tutorial changes

* fix missing batch_size

* add notebook

* test for batch_size use added

* add tutorial 16 to headers.py

* Add latest docstring and tutorial changes

* make DocumentClassifier indexing pipeline rdy

* Add latest docstring and tutorial changes

* flexibility improvements for DocumentClassifier in Pipelines

* Add latest docstring and tutorial changes

* fix index time usage

* remove query from documentclassifier tests

* improve classification_field resolving + minor fixes

* Add latest docstring and tutorial changes

* tutorial 16 extended with zero shot and pipelines

* Add latest docstring and tutorial changes

* install graphviz in notebook

* Add latest docstring and tutorial changes

* remove convert_to_dicts

* Add latest docstring and tutorial changes

* Fix typo

* Add latest docstring and tutorial changes

* remove retriever from indexing pipeline

* Add latest docstring and tutorial changes

* fix save_to_yaml when using FileTypeClassifier

* emphasize the impact with zero shot classification

* Add latest docstring and tutorial changes

* adjust use_gpu to boolean in test

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-11-09 18:43:00 +01:00
..
2021-04-13 09:45:04 +02:00
2021-04-13 09:45:04 +02:00
2021-04-13 09:45:04 +02:00
2021-04-13 09:45:04 +02:00
2020-09-18 12:57:32 +02:00
2020-09-18 12:57:32 +02:00

📒 Looking for the docs?

You find them here here:

https://haystack.deepset.ai/overview/intro

💻 How to update docs?

Overview and Usage

We move the Overview and Usage docs to the haystack-website repository. You will find the docs in the folder docs. Please make sure to only edit the newest version of the docs. We will release the docs together with the Haystack version. We are open for contibutions to our documentation. Please make sure to check our Contribution Guidelines. You will find a step by step introduction to our docs here.

Tutorials

The Tutorials live in the folder tutorials. They are created as colab notebooks which can be used by users to explore new haystack features. To include tutorials into the docs website, markdowns files need to be generated from the notebook. This can be done by running the script /docs/_src/tutorials/tutorials/convert_ipynb.py. Just run python convert_ipynb.py and the script will update all existing notebooks. Furthermore, plaese make sure to update the headers.py file with headers for the new tutorials. These headers are important for the docs website workflow. After the markdown files are generated successfully, you can raise a PR. We will review it and as soons as the markdown file is merged to master, it can be added to our website. Please follow the steps described here under Tutorial & Reference Docs.

API Reference

We use Pydoc-Markdown to create markdown files from the docstrings in our code.

Update docstrings

Execute the following commands in /haystack/docs/_src/api/api:

pip install 'pydoc-markdown==3.11.0'
./generate_docstrings.sh

If you want to generate a new markdown file for a new haystack module, please create a .yml which is inline with the following configuration and a a new line to generate_docstrings.sh for the module. After you ran the generate_docstrings.sh again, there should be a new markdown file for the module. To include it into the docs website, push it to master and follow the steps described here under Tutorial & Reference Docs.

Configuration

Pydoc will read the configuration from a .yml file which is located in the current working directory. Our files contains three main sections:

  • loader: A list of plugins that load API objects from python source files.
    • type: Loader for python source files
    • search_path: Location of source files
    • modules: Module which are used for generating the markdown file
    • ignore_when_discovered: Define which files should be ignored
  • processor: A list of plugins that process API objects to modify their docstrings (e.g. to adapt them from a documentation format to Markdown or to remove items that should not be rendered into the documentation).
    • type: filter: Filter for specific modules
    • documented_only: Only documented API objects
    • do_not_filter_modules: Do not filter module objects
    • skip_empty_modules: Skip modules without content
  • renderer: A plugin that produces the output files.
    • type: Define the renderer which you want to use. We are using the Markdown renderer as it can be configured in very detail.
    • descriptive_class_title: Remove the word "Object" from class titles.
    • descriptive_module_title: Adding the word “Module” before the module name
    • add_method_class_prefix: Add the class name as a prefix to method names
    • add_member_class_prefix: Add the class name as a prefix to member names
    • filename: file name of the generated file