33 Commits

Author SHA1 Message Date
Sara Zan
957e78ed9e
Upgrade pydoc-markdown & refactor GitHub Actions (#2117)
* Upgrade pydoc-markdown and fix the YAMLs to work with it

* Pin pydoc-markdown to major version

* Generalize pydoc-markdown workflow

* Make a single Action to perform all tasks that require committing into the local branch

* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green

* Installing Jupyter deps for Black

* Build cache before running generation tasks

* Add check not to run the code generation on master

* Simplify push action

* Add more test deps in setup.cfg and remove from GH Action workflow

* Remove forced upgrades on pip install

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-04 15:45:09 +01:00
Sara Zan
d470b9d0bd
Improve dependency management (#1994)
* Fist attempt at using setup.cfg for dependency management

* Trying the new package on the CI and in Docker too

* Add composite extras_require

* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports

* Fix bug on class import and rephrase error message

* Introduce typing for optional modules and add type: ignore in sparse.py

* Include importlib_metadata backport for py3.7

* Add colab group to extra_requires

* Fix pillow version

* Fix grpcio

* Separate out the crawler as another extra

* Make paths relative in rest_api and ui

* Update the test matrix in the CI

* Add try catch statements around the optional imports too to account for direct imports

* Never mix direct deps with self-references and add ES deps to the base install

* Refactor several paths in tests to make them insensitive to the execution path

* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix

* Wrap pdf conversion utils into safe_import

* Update some tutorials and rever Milvus1 as default for now, see #2067

* Fix mypy config


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-01-26 18:12:55 +01:00
Sara Zan
91cafb49bb
Improve tutorials' output (#1694)
* Modify __str__ and __repr__ for Document and Answer

* Rename QueryClassifier in Tutorial11

* Improve the output of tutorial1

* Make the output of Tutorial8 a bit less dense

* Add a print_questions util to print the output of question generating pipelines

* Replace custom printing with the new utility in Tutorial13

* Ensure all output is printed with minimal details in Tutorial14 and add some titles

* Minor change to print_answers

* Make tutorial3's output the same as tutorial1

* Add __repr__ to Answer and fix to_dict()

* Fix a bug in the Document and Answer's __str__ method

* Improve print_answers, print_documents and print_questions

* Using print_answers in Tutorial7 and fixing typo in the utils

* Remove duplicate line in Tutorial12

* Use print_answers in Tutorial4

* Add explanation of what the documents in the output of the basic QA pipeline are

* Move the fields constant into print_answers

* Normalize all 'minimal' to 'minimum' (they were mixed up)

* Improve the sample output to include all fields from Document and Answer

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-09 15:09:26 +01:00
Sara Zan
13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
Julian Risch
52e1fc991e
Update jobs link to personio (#1611)
* Update jobs link to personio

* Add latest docstring and tutorial changes

* Change jobs link to main website

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-21 11:42:32 +02:00
Malte Pietsch
99c8046367
Fix Tutorials (#1594)
* fix response format of DocumentSearchPipeline

* Add latest docstring and tutorial changes

* fix typos

* change prints in tutorial 4

* Add latest docstring and tutorial changes

* fix tutorial 13

* Add latest docstring and tutorial changes

* remove unused import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-14 11:49:35 +02:00
bogdankostic
c644e2b4d0
Add comment to tutorial notebooks about restarting runtime in colab (#1486)
* Add comment to tutorial notebooks about restarting runtime in colab

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-23 14:36:20 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00
Branden Chan
980d88a0f2
Update faq model (#1401) 2021-09-01 18:39:06 +02:00
Malte Pietsch
be9d19afa5
Remove Finder from tutorials (#1329) 2021-08-10 11:50:59 +02:00
Branden Chan
783893c3d2
Tutorial update (#1166)
* Add header / footer

* Add Milvus example

* Generate md files

* Fix mypy CI
2021-06-11 11:09:15 +02:00
Julian Risch
a7ba146246
Removed comma from last item in json list (#1114) 2021-06-01 12:32:21 +02:00
Julian Risch
40ceaf418a
Fixing grpcio-tools to version of colab's pre-installed grpcio (#1113) 2021-05-31 19:09:10 +02:00
Malte Pietsch
e91518ee00
Update tutorials (torch versions, ES version, replace Finder with Pipeline) (#814)
* remove manual torch install on colab

* update elasticsearch version everywhere to 7.9.2

* fix FAQPipeline

* update tutorials with new pipelines

* Add latest docstring and tutorial changes

* revert faqpipeline change. fix field names in tutorial 4

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 14:56:54 +01:00
Branden Chan
7376185b65
Create DPR training tutorial (#708)
* WIP: Start DPR training tutorial

* Create basics of DPR Train tutorial

* Update documentation

* Allow DPR to be initialized without document store

* WIP: Add param descriptions to DPR notebook

* Clean tutorial

* Improve loading

* Make doc store optional when loading DPR

* Satisfy mypy type check

* Add links

* Add tutorial header

* Add colab badge

* Clear outputs

* Incorporate reviewer feedback

* WIP: Start DPR training tutorial

* Create basics of DPR Train tutorial

* Update documentation

* Allow DPR to be initialized without document store

* WIP: Add param descriptions to DPR notebook

* Clean tutorial

* Improve loading

* Make doc store optional when loading DPR

* Satisfy mypy type check

* Add links

* Add tutorial header

* Add colab badge

* Clear outputs

* Incorporate reviewer feedback

* Add readme links

* Regenerate tutorials

* Add excitement

* Fix typo

* Fix hard negatives comment

* Wrap tutorial for windows users

* Fix mypy issue
2021-01-13 10:33:55 +01:00
Malte Pietsch
94b7345505
Make use_gpu=True the default in tutorials (#692)
* enable gpu args in tutorials

* add info box for gpu runtime on colab
2020-12-22 07:58:12 +01:00
Branden Chan
e72f4f4299
Update Colab Torch Version (#576)
* Update torch version

* Update torch version
2020-11-11 13:55:10 +01:00
Tanay Soni
db4151bbc0
Fix scoring in Elasticsearch for dot product (#517) 2020-10-23 17:50:49 +02:00
bogdankostic
f62117c232
Add urllib version requirement to colab notebooks (#509) 2020-10-23 10:43:58 +02:00
Guillim
fb5db59590
Remove useless line from Tutorial4_FAQ_style_QA (#416)
* Update Tutorial4_FAQ_style_QA.py

Used to be useful when `.apply()` was necessary, but not any longer

* Update Tutorial4_FAQ_style_QA.ipynb
2020-09-22 09:01:04 +02:00
Malte Pietsch
747e0c0046
Bump FARM to 0.4.9. Remove custom torch installation from colab tutorials (#404) 2020-09-21 10:26:12 +02:00
Malte Pietsch
271ff30262
fix type casting of embeddings for tutorial 4 (#402) 2020-09-18 18:10:50 +02:00
Branden Chan
7fdb85d63a
Create documentation website (#272)
* Skeleton of doc website

* Flesh out documentation pages

* Split concepts into their own rst files

* add tutorial rsts

* Consistent level 1 markdown headers in tutorials

* Change theme to readthedocs

* Turn bullet points into prose

* Populate sections

* Add more text

* Add more sphinx files

* Add more retriever documentation

* combined all documenations in one structure

* rename of src to _src as it was ignored by git

* Incorporate MP2's changes

* add benchmark bar charts

* Adapt docstrings in Readers

* Improvements to intro, creation of glossary

* Adapt docstrings in Retrievers

* Adapt docstrings in Finder

* Adapt Docstrings of Finder

* Updates to text

* Edit text

* update doc strings

* proof read tutorials

* Edit text

* Edit text

* Add stacked chart

* populate graph with data

* Switch Documentation to markdown (#386)

* add way to generate markdown files to sphinx

* changed from rst to markdown and extended sphinx for it

* fix spelling

* Clean titles

* delete file

* change spelling

* add sections to document store usage

* add basic rest api docs

* fix readme in setup.py

* Update Tutorials

* Change section names

* add windows note to pip install

* update intro

* new renderer for markdown files

* Fix typos

* delete dpr_utils.py

* fix windows note in get started

* Fix docstrings

* deleted rest api docs in api

* fixed typo

* Fix docstring

* revert readme to rst

* Fix readme

* Update setup.py

Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-09-18 12:57:32 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore

* move document, label, multilabel to haystack/schema.py

* rename documentstore -> document_store

* split indexing modules -> file_converter + preprocessor

* fix order of imports

* Update tutorial notebooks

* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Malte Pietsch
bde33ddaaa
Bump FARM version to 0.4.8 and PyTorch >=1.5.1, <= 1.6.0 (#376)
* bump farm version to 0.4.8

* move back to original transformers pipeline

* remove dpr_utils and use transformers implementation

* update tutorial notebooks
2020-09-16 17:24:40 +02:00
Branden Chan
a54d6a5bd7
Make Tutorials Work on Colab GPUs (#322)
* Add pip install torch+cu
2020-08-19 14:52:50 +02:00
Malte Pietsch
29a15c0d59
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243) 2020-07-31 11:34:06 +02:00
Malte Pietsch
5b1be233d0 Update Tutorial 4 2020-07-17 19:31:00 +02:00
Malte Pietsch
6bed2f509f
Refactor DPR for latest transformers version & change init arg gpu -> use_gpu for DPR and EmbeddingRetriever (#239)
* fix tokenizer warning in latest transformers

* change dpr arg from gpu to use_gpu

* change gpu arg for EmbeddingRetriever
2020-07-16 10:45:01 +02:00
Tanay Soni
4c21556a79
Fix embedding method for Retriever (#220) 2020-07-13 12:38:01 +02:00
Malte Pietsch
fe33a481ad
Update tutorials (#200)
* fix link in readme. update installation in tutorials

* update haystack version to latest master

* add basic documentation for input to write_documents()

* add docstring for sqldocumentstore

* comment out docker in notebook
2020-07-07 14:59:01 +02:00
Tanay Soni
71e15a5a11
Update Haystack version in tutorials (#136) 2020-06-08 11:31:12 +02:00
Malte Pietsch
a431a94b04
Add basic tutorial for FAQ-based QA & batch comp. of embeddings (#98)
* Add basic tutorial for FAQ-based QA and switch to bach computation of embeddings

* update readme & haystack version in tutorial
2020-05-07 10:19:26 +02:00