haystack/docs/_src/api/api/question_generator.md
Sara Zan 957e78ed9e
Upgrade pydoc-markdown & refactor GitHub Actions (#2117)
* Upgrade pydoc-markdown and fix the YAMLs to work with it

* Pin pydoc-markdown to major version

* Generalize pydoc-markdown workflow

* Make a single Action to perform all tasks that require committing into the local branch

* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green

* Installing Jupyter deps for Black

* Build cache before running generation tasks

* Add check not to run the code generation on master

* Simplify push action

* Add more test deps in setup.cfg and remove from GH Action workflow

* Remove forced upgrades on pip install

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 15:45:09 +01:00

858 B

Module question_generator

QuestionGenerator

class QuestionGenerator(BaseComponent)

The Question Generator takes only a document as input and outputs questions that it thinks can be answered by this document. In our current implementation, input texts are split into chunks of 50 words with a 10 word overlap. This is because the default model valhalla/t5-base-e2e-qg seems to generate only about 3 questions per passage regardless of length. Our approach prioritizes the creation of more questions over processing efficiency (T5 is able to digest much more than 50 words at once). The returned questions generally come in an order dictated by the order of their answers i.e. early questions in the list generally come from earlier in the document.