haystack/tutorials at 69a0c9f2edc48e25e4633d6154ad1af4035996e6 - haystack - Gitea: Git with a cup of tea

yujunjun/haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-31 00:57:37 +00:00

History

Markus Sagen 69a0c9f2ed

Clarify docs for PDF conversion, languages and encodings (#1570 )

* Clarify PDF conversion, languages and encodings

The parameter name `valid_languages` may be a bit miss-leading from
reading only the tutorials. Users may, incorrectly assume that it
enforces that the conversions only works for those languages, then it's
more of a check.

- Provided clarifications in the tutorials to highlight what
valid_languages does and that changing the encoding may give better
results for their language of choice
- Updated the command for `pdftotext` to the correct one

* Allow encodings for `convert_files_to_dicts`

- Set option of passing encoding to the converters. Trying even for some
Latin1 languages, the converter does not do it in a good way.

Potential issues is that the encoding defaults to None, which is default
for the other converters, but not for the PDFToTextConverter. Could add
a check and change the ending to Latin1 for pdf if set to None.

Was considering adding it to **kwargs, but since it may be a commonly
used feature to be documented, I added it as a keyword argument instead.
Would love to hear your input and feedback on in.

* Set back PDF default encoding

* Update documentation

2021-10-11 09:30:12 +02:00

..

small_faq_covid.csv

Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243 )

2020-07-31 11:34:06 +02:00

small_generator_dataset.csv

[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 )

2020-10-30 18:06:02 +01:00

Tutorial1_Basic_QA_Pipeline.ipynb

Replace FARM import statements; add dependencies (#1492 )

2021-09-28 16:34:24 +02:00

Tutorial1_Basic_QA_Pipeline.py

Regenerate API and Tutorial md files (#1480 )

2021-09-21 14:42:18 +02:00

Tutorial2_Finetune_a_model_on_your_data.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial2_Finetune_a_model_on_your_data.py

Change variable names (#1286 )

2021-07-14 14:03:34 +02:00

Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.py

Refactor communication between Pipeline Components (#1321 )

2021-09-10 11:41:16 +02:00

Tutorial4_FAQ_style_QA.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial4_FAQ_style_QA.py

Refactor communication between Pipeline Components (#1321 )

2021-09-10 11:41:16 +02:00

Tutorial5_Evaluation.ipynb

Replace FARM import statements; add dependencies (#1492 )

2021-09-28 16:34:24 +02:00

Tutorial5_Evaluation.py

Replace FARM import statements; add dependencies (#1492 )

2021-09-28 16:34:24 +02:00

Tutorial6_Better_Retrieval_via_DPR.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial6_Better_Retrieval_via_DPR.py

Refactor communication between Pipeline Components (#1321 )

2021-09-10 11:41:16 +02:00

Tutorial7_RAG_Generator.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial7_RAG_Generator.py

Refactor communication between Pipeline Components (#1321 )

2021-09-10 11:41:16 +02:00

Tutorial8_Preprocessing.ipynb

Clarify docs for PDF conversion, languages and encodings (#1570 )

2021-10-11 09:30:12 +02:00

Tutorial8_Preprocessing.py

Clarify docs for PDF conversion, languages and encodings (#1570 )

2021-10-11 09:30:12 +02:00

Tutorial9_DPR_training.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial9_DPR_training.py

Tutorial update (#1166 )

2021-06-11 11:09:15 +02:00

Tutorial10_Knowledge_Graph.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial10_Knowledge_Graph.py

Tutorial update (#1166 )

2021-06-11 11:09:15 +02:00

Tutorial11_Pipelines.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial11_Pipelines.py

Refactor communication between Pipeline Components (#1321 )

2021-09-10 11:41:16 +02:00

Tutorial12_LFQA.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial12_LFQA.py

Refactor communication between Pipeline Components (#1321 )

2021-09-10 11:41:16 +02:00

Tutorial13_Question_generation.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial13_Question_generation.py

Add QuestionGenerator (#1267 )

2021-07-26 17:20:43 +02:00

Tutorial14_Query_Classifier.ipynb

Add comment to tutorial notebooks about restarting runtime in colab (#1486 )

2021-09-23 14:36:20 +02:00

Tutorial14_Query_Classifier.py

Refactor communication between Pipeline Components (#1321 )

2021-09-10 11:41:16 +02:00