mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-09-02 12:53:52 +00:00

* Upgrade pydoc-markdown and fix the YAMLs to work with it * Pin pydoc-markdown to major version * Generalize pydoc-markdown workflow * Make a single Action to perform all tasks that require committing into the local branch * Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green * Installing Jupyter deps for Black * Build cache before running generation tasks * Add check not to run the code generation on master * Simplify push action * Add more test deps in setup.cfg and remove from GH Action workflow * Remove forced upgrades on pip install Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
858 B
858 B
Module question_generator
QuestionGenerator
class QuestionGenerator(BaseComponent)
The Question Generator takes only a document as input and outputs questions that it thinks can be
answered by this document. In our current implementation, input texts are split into chunks of 50 words
with a 10 word overlap. This is because the default model valhalla/t5-base-e2e-qg
seems to generate only
about 3 questions per passage regardless of length. Our approach prioritizes the creation of more questions
over processing efficiency (T5 is able to digest much more than 50 words at once). The returned questions
generally come in an order dictated by the order of their answers i.e. early questions in the list generally
come from earlier in the document.