Julian Risch
f2a3f95ab6
add note on gpu runtime to tutorial 13 ( #1614 )
...
* add note on gpu runtime to tutorial 13
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-20 09:55:56 +02:00
Malte Pietsch
99c8046367
Fix Tutorials ( #1594 )
...
* fix response format of DocumentSearchPipeline
* Add latest docstring and tutorial changes
* fix typos
* change prints in tutorial 4
* Add latest docstring and tutorial changes
* fix tutorial 13
* Add latest docstring and tutorial changes
* remove unused import
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-14 11:49:35 +02:00
Malte Pietsch
82c2cdf7cd
Merge branch 'master' of github.com:deepset-ai/haystack
2021-10-13 14:45:17 +02:00
Malte Pietsch
db2b5d913b
Fix param in tutorial 8
2021-10-13 14:45:09 +02:00
Malte Pietsch
4a6c9302b3
Redesign primitives - Document
, Answer
, Label
( #1398 )
...
* first draft / notes on new primitives
* wip label / feedback refactor
* rename doc.text -> doc.content. add doc.content_type
* add datatype for content
* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field
* update converters for . Add warning for empty
* renam label.question -> label.query. Allow sorting of Answers.
* WIP primitives
* update ui/reader for new Answer format
* Improve Label. First refactoring of MultiLabel. Adjust eval code
* fixed workflow conflict with introducing new one (#1472 )
* Add latest docstring and tutorial changes
* make add_eval_data() work again
* fix reader formats. WIP fix _extract_docs_and_labels_from_dict
* fix test reader
* Add latest docstring and tutorial changes
* fix another test case for reader
* fix mypy in farm reader.eval()
* fix mypy in farm reader.eval()
* WIP ORM refactor
* Add latest docstring and tutorial changes
* fix mypy weaviate
* make label and multilabel dataclasses
* bump mypy env in CI to python 3.8
* WIP refactor Label ORM
* WIP refactor Label ORM
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* WIP refactor ORMs
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* add dataclasses-json dependency. adjust ORM to just store JSON repr
* ignore type as dataclasses_json seems to miss functionality here
* update readme and contributing.md
* update contributing
* adjust example
* fix duplicate doc handling for custom index
* Add latest docstring and tutorial changes
* fix some ORM issues. fix get_all_labels_aggregated.
* update drop flags where get_all_labels_aggregated() was used before
* Add latest docstring and tutorial changes
* add to_json(). add + fix tests
* fix no_answer handling in label / multilabel
* fix duplicate docs in memory doc store. change primary key for sql doc table
* fix mypy issues
* fix mypy issues
* haystack/retriever/base.py
* fix test_write_document_meta[elastic]
* fix test_elasticsearch_custom_fields
* fix test_labels[elastic]
* fix crawler
* fix converter
* fix docx converter
* fix preprocessor
* fix test_utils
* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations
* Add latest docstring and tutorial changes
* fix crawler test. fix ocrconverter attribute
* fix test_elasticsearch_custom_query
* fix generator pipeline
* fix ocr converter
* fix ragenerator
* Add latest docstring and tutorial changes
* fix test_load_and_save_yaml for elasticsearch
* fixes for pipeline tests
* fix faq pipeline
* fix pipeline tests
* Add latest docstring and tutorial changes
* fix weaviate
* Add latest docstring and tutorial changes
* trigger CI
* satisfy mypy
* Add latest docstring and tutorial changes
* satisfy mypy
* Add latest docstring and tutorial changes
* trigger CI
* fix question generation test
* fix ray. fix Q-generation
* fix translator test
* satisfy mypy
* wip refactor feedback rest api
* fix rest api feedback endpoint
* fix doc classifier
* remove relation of Labels -> Docs in SQL ORM
* fix faiss/milvus tests
* fix doc classifier test
* fix eval test
* fixing eval issues
* Add latest docstring and tutorial changes
* fix mypy
* WIP replace dataclasses-json with manual serialization
* Add latest docstring and tutorial changes
* revert to dataclass-json serialization for now. remove debug prints.
* update docstrings
* fix extractor. fix Answer Span init
* fix api test
* keep meta data of answers in reader.run()
* fix meta handling
* adress review feedback
* Add latest docstring and tutorial changes
* make document=None for open domain labels
* add import
* fix print utils
* fix rest api
* adress review feedback
* Add latest docstring and tutorial changes
* fix mypy
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
Markus Sagen
69a0c9f2ed
Clarify docs for PDF conversion, languages and encodings ( #1570 )
...
* Clarify PDF conversion, languages and encodings
The parameter name `valid_languages` may be a bit miss-leading from
reading only the tutorials. Users may, incorrectly assume that it
enforces that the conversions only works for those languages, then it's
more of a check.
- Provided clarifications in the tutorials to highlight what
valid_languages does and that changing the encoding may give better
results for their language of choice
- Updated the command for `pdftotext` to the correct one
* Allow encodings for `convert_files_to_dicts`
- Set option of passing encoding to the converters. Trying even for some
Latin1 languages, the converter does not do it in a good way.
Potential issues is that the encoding defaults to None, which is default
for the other converters, but not for the PDFToTextConverter. Could add
a check and change the ending to Latin1 for pdf if set to None.
Was considering adding it to **kwargs, but since it may be a commonly
used feature to be documented, I added it as a keyword argument instead.
Would love to hear your input and feedback on in.
* Set back PDF default encoding
* Update documentation
2021-10-11 09:30:12 +02:00
Julian Risch
f9d2f786ca
Replace FARM import statements; add dependencies ( #1492 )
...
* Replace FARM import statements; add dependencies
* Add InferenceProc., TextCl.Proc., TextPairCl.Proc.
* Remove FARMRanker, add type annotations, rename max_sample
* Add sample_to_features_text for InferenceProc.
* Fix type annotations: model_name_or_path is str not Path
* Fix mypy errors: implement _create_dataset in TextCl.Proc.
* Add task_type "embeddings" in Inferencer
* Allow loading AdaptiveModel for embedding task
* Add SQuAD eval metrics; enable InferenceProc for embedding task
* Add baskets as param to log_samples and handle empty basket list in log_samples
* Remove unused dependencies
* Remove FARMClassifier (doc classificer) due to ref to TextClassificationHead
* Remove FARMRanker and Classifier from doc generation scripts
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-28 16:34:24 +02:00
bogdankostic
c644e2b4d0
Add comment to tutorial notebooks about restarting runtime in colab ( #1486 )
...
* Add comment to tutorial notebooks about restarting runtime in colab
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-23 14:36:20 +02:00
Julian Risch
d569e66bc7
Update Tutorial1_Basic_QA_Pipeline.ipynb ( #1489 )
...
* Update Tutorial1_Basic_QA_Pipeline.ipynb
passing params to pipeline as dict
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-22 16:35:20 +02:00
Branden Chan
bddee2def4
Define SAS model in notebook ( #1485 )
...
* Define SAS model in notebook
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 17:05:16 +02:00
Branden Chan
2c4baa7f4e
Regenerate API and Tutorial md files ( #1480 )
...
* Change punctuation
* Add latest docstring and tutorial changes
* Change punctuation
* Add documentation for Docs2Answer
* Add latest docstring and tutorial changes
* Generate new API docs
* Replace Finder with Pipeline
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-09-21 14:42:18 +02:00
ju-gu
05da7f71dd
changed delete_all_documents to delete_documents ( #1477 )
2021-09-20 14:29:33 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components ( #1321 )
2021-09-10 11:41:16 +02:00
Branden Chan
980d88a0f2
Update faq model ( #1401 )
2021-09-01 18:39:06 +02:00
Branden Chan
1938fb001b
Add support for no Docker envs in Tutorial 13 ( #1365 )
...
* Add support for no docker envs e.g. colab
* Generate md
2021-08-31 15:22:51 +02:00
Jeff Hammerbacher
1c8a03aaa2
Rag tutorial fixes ( #1375 )
...
* Update Tutorial7_RAG_Generator.ipynb
`delete_all_documents` --> `delete_documents` (cf. #1045 )
* Update Tutorial7_RAG_Generator.py
`delete_all_documents` --> `delete_documents` (cf. #1045 )
2021-08-30 15:27:18 +02:00
ramgarg102
51f0a56e5d
delete_all_documents() replaced by delete_documents() ( #1377 )
...
* [UPDT] delete_all_documents() replaced by delete_documents()
* [UPDT] warning logs to be fixed
* [UPDT] delete_all_documents() renamed and the same method added
Co-authored-by: Ram Garg <ramgarg102@gmai.com>
2021-08-30 15:18:28 +02:00
Timo Moeller
07bd3c50ea
Add new QA eval metric: Semantic Answer Similarity (SAS) ( #1338 )
...
* init
* Add type annotation
* Add test case, fix mypy
* Add german model to docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-08-12 14:31:48 +02:00
Malte Pietsch
be9d19afa5
Remove Finder from tutorials ( #1329 )
2021-08-10 11:50:59 +02:00
Ikram Ali
d94674c5b6
Remove finder class from tutorial 1 ( #1328 )
2021-08-10 11:41:07 +02:00
Malte Pietsch
5e16ec4d76
Fix installation in Colab Tutorial 11
2021-08-10 08:50:04 +02:00
Shahrukh Khan
cc43502e7e
Add Tutorial about Query Classifier ( #1324 )
...
* add query classifier colab and jupyter notebook
* Delete Tutorial13_Query_Classifier.ipynb
* add query classifier tutorial with updated number
* add query classifier tutorial script
* Rename tutorial14_query_classifier.py to Tutorial14_Query_Classifier.py
2021-08-09 10:43:50 +02:00
Branden Chan
937247d628
Add QuestionGenerator ( #1267 )
...
* Create basic Question Generation
* Split texts into 50 word chunks
* Allow prompt to be changed
* Implement iteration functionality in DS
* Add docstrings, create pipelines
* Make pipelines work
* Add comments
* Add tests
* Add tutorials and docs
* Add doc string
2021-07-26 17:20:43 +02:00
Branden Chan
da97d81305
Change variable names ( #1286 )
2021-07-14 14:03:34 +02:00
Julian Risch
2a90471c73
Encapsulate tutorial code in method ( #1266 )
2021-07-09 17:08:19 +02:00
Branden Chan
efc03f72db
Make PreProcessor.process() work on lists of documents ( #1163 )
...
* Add process_batch method
* Rename methods
* Fix doc string, satisfy mypy
* Fix mypy CI
* Fix typp
* Update tutorial
* Fix argument name
* Change arg name
* Incorporate reviewer feedback
2021-06-23 18:13:51 +02:00
Branden Chan
7dbd58f6be
Add about sections ( #1195 )
2021-06-14 18:37:00 +02:00
vblagoje
2a5882578a
Add Longform-QA (LFQA), Seq2SeqGenerator for generative QA and Retribert Retriever ( #1086 )
...
* Integrate LFQA with Haystack
* Integrate LFQA with Haystack - unit tests
* Properly initialize conftest default value for vector_dim
* Update PR after inital feedback
* Fix conftest.py import
* Seq2SeqGenerator uses Callables instead of subclasses for custom model input
* Update docstring
* Fix Callable use
* Add LFQA tutorials
* Improve type error reporting for invalid input converter Callable
* Generate docstrings
* Format comments in tutorial script
* Generate tutorial md
* Add usage page
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: brandenchan <brandenchan@icloud.com>
2021-06-14 17:53:43 +02:00
Branden Chan
783893c3d2
Tutorial update ( #1166 )
...
* Add header / footer
* Add Milvus example
* Generate md files
* Fix mypy CI
2021-06-11 11:09:15 +02:00
Branden Chan
aa6f768efa
Prevent merge of same questions on different documents during evaluation ( #1119 )
...
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Make label field flexible, add docstrings
* Satisfy mypy
* Fix failing tests
* Adjust docstring
* Fix tutorial
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-02 12:09:03 +02:00
Julian Risch
a7ba146246
Removed comma from last item in json list ( #1114 )
2021-06-01 12:32:21 +02:00
Julian Risch
40ceaf418a
Fixing grpcio-tools to version of colab's pre-installed grpcio ( #1113 )
2021-05-31 19:09:10 +02:00
Julian Risch
84c34295a1
Re-ranking component for document search without QA ( #1025 )
...
* Adding ranker similar to retriever and reader
* Sort documents according to query-document similarity scores
* Reranking and model training runs for small example
* Added EvalRanker node
* Calculate recall@k in EvalRetriever and EvalRanker nodes
* Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers
* Added mean reciprocal rank as metric for EvalDocuments
* Fix bug that appeared when ranking documents with same score
* Remove commented code for unimplmented eval() of Ranker node
* Add documentation of k parameter in EvalDocuments
* Add Ranker docu and renaming top_k param
2021-05-31 15:31:36 +02:00
Branden Chan
9827b3652e
Pipelines tutorial ( #991 )
...
* Start Pipelines tutorial
* Make Tutorial 11 run locally
* Add colab compatibility
* Fix pip install
* Add ES install from source
* Add ES install from source
* Add pygraphviz installation
* Incorporate reviewer feedback
* Ensure print_answers() works for Generator output
* Fix typo
2021-04-29 17:31:28 +02:00
Branden Chan
9626c0d65e
Update Documentation ( #976 )
...
* Add api pages
* Add latest docstring and tutorial changes
* First sweep of usage docs
* Add link to conversion script
* Add import statements
* Add summarization page
* Add web crawler documentation
* Add confidence scores usage
* Add crawler api docs
* Regenerate api docs
* Update summarizer and translator api
* Add api pages
* Add latest docstring and tutorial changes
* First sweep of usage docs
* Add link to conversion script
* Add import statements
* Add summarization page
* Add web crawler documentation
* Add confidence scores usage
* Add crawler api docs
* Regenerate api docs
* Update summarizer and translator api
* Add indentation (pydoc-markdown 3.10.1)
* Comment out metadata
* Remove Finder deprecation message
* Remove Finder in FAQ
* Update tutorial link
* Incorporate reviewer feedback
* Regen api docs
* Add type annotations
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 16:45:29 +02:00
Julian Risch
d38c07e0ee
knowledge graph example ( #934 )
...
* Add knowledge graph module
* Fix type hint
* Add graph retriver module
* Change type annotations, change return format
* Add graph retriever that executes questions as sparql queries
* Linking only those entities that are in the knowledge graph
* Added logging and using relations extracted from Knowledge graph for linking
* Preventing entity linking from linking the same token to multiple entities
* Pruning triples that have no variables for select and count queries
* Support knowledge graphs with Pipelines
* Add text2sparql
* Entity linking and relation linking consider more special cases now based on evaluation on labelled data
* Separating example code from KGQA implementation
* Add eval on combined extarctive and kg questions
* Remove references to hp-test
* Add fields sparql_query and long_answer_list to metadata
* Removing modular Question2SPARQL approach
* Removing additional classes used for modular kgqa approach
* preparing lcquad data
* change graph db
* Translating namespaces in knowledge graph queries
* Creating graphdb index and loading triples from .ttl file
* Fetching graph config files, triples and model from S3
* Fix incompatibility issues with BaseGraphRetriever and BaseComponent
* Removing unused utility functions
* Adding doc strings and tutorial header
* Adding sparqlwrapper dependency
* Moving tutorial header
* Sorting tutorials by number within name of notebook
* Add latest docstring and tutorial changes
* Creating test cases for knowledge graph
* Changing knowledge graph example to harry potter
* Add latest docstring and tutorial changes
* Adapting the tutorial notebook to harry potter example
* Add GraphDB fixture for tests
* Add latest docstring and tutorial changes
* Added GraphDB docker launch to CI
* Use correct GraphDB fixture
* Check if GraphDB instance is already running
* Renaming question/query and incorporating other feedback from Timo and Tanay
* Removed type annotation
* Add latest docstring and tutorial changes
Co-authored-by: oryx1729 <oryx1729@protonmail.com>
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-08 14:05:33 +02:00
Branden Chan
d77152c469
WIP: Add evaluation nodes for Pipelines ( #904 )
...
* Add main eval fns
* WIP: make pipeline_eval.py run
* Fix typo
* Add support for no_answers
* Add latest docstring and tutorial changes
* Working pipeline eval
* Add timing of nodes
* Add latest docstring and tutorial changes
* Refactor and clean
* Update tutorial script
* Set default params
* Update tutorials
* Fix indent
* Add latest docstring and tutorial changes
* Address mypy issues
* Add test
* Fix mypy error
* Clear outputs
* Add doc strings
* Incorporate reviewer feedback
* Add latest docstring and tutorial changes
* Revert query counting
* Fix typo
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-01 17:35:18 +02:00
Timo Moeller
f954f0db38
Fix top_k param in RAG tutorials ( #906 )
...
* Fix top_k param
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-18 18:00:21 +01:00
Branden Chan
24d0c4d42d
Fix DPR training batch size ( #898 )
...
* Adjust batch size
* Add latest docstring and tutorial changes
* Update training results
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-03-17 18:33:59 +01:00
brandenchan
03cda26d85
Fix link in Tutorial 8
2021-02-15 10:45:27 +01:00
Malte Pietsch
e91518ee00
Update tutorials (torch versions, ES version, replace Finder with Pipeline) ( #814 )
...
* remove manual torch install on colab
* update elasticsearch version everywhere to 7.9.2
* fix FAQPipeline
* update tutorials with new pipelines
* Add latest docstring and tutorial changes
* revert faqpipeline change. fix field names in tutorial 4
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-02-09 14:56:54 +01:00
Branden Chan
8d47a71b00
Fix Tutorial 9 ( #734 )
...
* Add package download
* Change dev to train file
2021-01-14 10:56:58 +01:00
Julian Risch
3331608e03
Adding a guard that prevents the tutorial code from being executed in every subprocess when using multiprocessing on windows ( #729 )
2021-01-13 18:17:54 +01:00
Branden Chan
7376185b65
Create DPR training tutorial ( #708 )
...
* WIP: Start DPR training tutorial
* Create basics of DPR Train tutorial
* Update documentation
* Allow DPR to be initialized without document store
* WIP: Add param descriptions to DPR notebook
* Clean tutorial
* Improve loading
* Make doc store optional when loading DPR
* Satisfy mypy type check
* Add links
* Add tutorial header
* Add colab badge
* Clear outputs
* Incorporate reviewer feedback
* WIP: Start DPR training tutorial
* Create basics of DPR Train tutorial
* Update documentation
* Allow DPR to be initialized without document store
* WIP: Add param descriptions to DPR notebook
* Clean tutorial
* Improve loading
* Make doc store optional when loading DPR
* Satisfy mypy type check
* Add links
* Add tutorial header
* Add colab badge
* Clear outputs
* Incorporate reviewer feedback
* Add readme links
* Regenerate tutorials
* Add excitement
* Fix typo
* Fix hard negatives comment
* Wrap tutorial for windows users
* Fix mypy issue
2021-01-13 10:33:55 +01:00
Branden Chan
bb8aba18e0
Create Preprocessing Tutorial ( #706 )
...
* WIP: First version of preprocessing tutorial
* stride renamed overlap, ipynb and py files created
* rename split_stride in test
* Update preprocessor api documentation
* define order for markdown files
* define order of modules in api docs
* Add colab links
* Incorporate review feedback
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
2021-01-06 15:54:05 +01:00
Malte Pietsch
94b7345505
Make use_gpu=True the default in tutorials ( #692 )
...
* enable gpu args in tutorials
* add info box for gpu runtime on colab
2020-12-22 07:58:12 +01:00
Branden Chan
d8154939fc
Scale dot product into probabilities ( #667 )
...
* scale dot product
* Add tip in documentation
* Add recommendation boxes
* WIP: Use similarity attribute in all doc stores
* Implement similarity for InMemoryDS
* Add FAISS support
* Clean printout
* Update documentation
* Implement document field map
2020-12-11 12:10:24 +01:00
Branden Chan
8c904d79d6
Fix links ( #663 )
2020-12-08 10:28:31 +01:00
Tanay Soni
8e52b48e1d
Add pipelines for GenerativeQA & FAQs ( #645 )
2020-12-03 10:27:06 +01:00
Branden Chan
79555148ac
Add link to FAISS Info in documentation ( #643 )
...
* Add link to FAISS info
* Clean link
2020-12-02 15:24:22 +01:00