Julian Risch
f599ce9458
Change "text" to "content" as dict key ( #2800 )
...
* change "text" to "content" as dict key
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-13 16:36:06 +02:00
Sara Zan
d8e7aaeacc
API key check in OpenAIAnswerGenerator ( #2791 )
...
* api key check in node and tests
* Clarify skip message
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-12 14:05:47 +02:00
Sara Zan
4d2a06989d
Fix YAML validation for ElasticsearchDocumentStore.custom_query ( #2789 )
...
* Add exception for in the validation code
* Update Documentation & Code Style
* Add tests
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-12 13:49:06 +02:00
Sara Zan
091711b8c4
Fix Tutorials and Tutorials (nightly) ( #2737 )
...
* Remove caching and install audio deps
* Fix `Tutorials` as well
* Run all tutorials even though some fail
* Forgot fi
* fix failure condition
* proper bash string equality
* Enable debug logs
* remove audio files
* Update Documentation & Code Style
* Use the setup action in the Tutorial CI as well
* Try with a file that exists
* Update Documentation & Code Style
* Fix the comments in the tutorials
* Update Documentation & Code Style
* Fix tutorials.sh
* Remove debug logging
* import pprint and try editable install
* Update Documentation & Code Style
* extract no run list
* Add tutorial18 to no run list nightly
* import pprint correctly
* Update Documentation & Code Style
* try making site-packages editable
* Make pythonpath editable every time Tut17 is run on CI
* typo
* fix imports in tut5
* add git clean
* Update Documentation & Code Style
* add comments and remove` -e`
* accidentally deleted a line
* Update .github/utils/tutorials.sh
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2022-07-12 11:22:17 +02:00
Sowmiya Jaganathan
4d8f40425b
Passing the meta-data in the summerizer response ( #2179 )
...
* Passing the all the meta-data in the summerizer
* Disable metadata forwarding if `generate_single_summary` is `True`
* Update Documentation & Code Style
* simplify tests
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-11 17:28:36 +02:00
Daniel Augustus Bichuetti Silva
1706729e26
Prevent PDFToTextConverter from failing on PDFs with spaces in their names ( #2786 )
...
* Change split logic to list
* Fix wrong parameter for run
* Fix mypy error
* Fix layout/raw parameter
* Add test for filename with whitespaces on PDFToText
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-11 13:30:33 +02:00
Daniel Augustus Bichuetti Silva
77a513fe49
Fix crawler long file names ( #2723 )
...
* Changing the name that crawled page is saved to avoid long file names error on some file systems
* Custom naming function for saving crawled files
* Update Documentation & Code Style
* Remove bad characters on file name and preffix
* Add test for naming function
* Update Documentation & Code Style
* Fix expensive regex recalculation and linter warns
* Check for exceptions on file dump
* Remove param_naming variable
* Fix file paths on Windows, Linux and Mac
* Update Documentation & Code Style
* Test using one of the docstrings examples
* Change default naming function
Update docstrings
* Applying formatting rules
* Update Documentation & Code Style
* Fix mypy incompatible assignment error
* Remove unused type declaration
* Fix typo
* Update tests for naming function
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-11 12:16:32 +02:00
Malte Pietsch
ba08fc86f5
Add node to use OpenAI's GPT-3 for QA ( #2605 )
...
* first draft of openai node for QA
* Update Documentation & Code Style
* fix mypy. add node to inits
* Update Documentation & Code Style
* fix linter
* Adapt OpenAIGenerator to completions endpoint
* Update Documentation & Code Style
* Fix pylint
* Fix doc strings
* Make use of temperature
* Make use of api key in tests
* Adapt doc strings
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
2022-07-08 13:59:27 +02:00
Agnieszka Marzec
425da1fd31
Fix load_from_yaml example in the Pipelines tutorial ( #2774 )
...
* Fix load from yaml example and image
* Update Documentation & Code Style
* Fixed pipeline exmple
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-08 11:22:11 +02:00
James Briggs
ea40387b97
added mock pinecone client ( #2770 )
2022-07-07 19:51:30 +02:00
tstadel
d21b066fc7
fix pipeline run loop on joined pipelines whithout debug flag ( #2777 )
...
* fix pipeline run loop on joined pipelines whithout debug flag
* use .keys() consistently
2022-07-07 16:47:59 +02:00
bogdankostic
195aed942f
Add update_document_meta to InMemoryDocumentStore ( #2689 )
...
* Add update_document_meta to InMemoryDocumentStore
* Fix typo
* Update Documentation & Code Style
* Add update_document_meta to BaseDocumentStore
* Update Documentation & Code Style
* Fix mypy
* Update Documentation & Code Style
* Add update_document_meta to MockDocumentStore
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-07 15:44:07 +02:00
tstadel
45136badfe
Fix _debug info getting lost for previous nodes when using join nodes ( #2776 )
...
* fix debug output for pipelines with join nodes
* add test
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-07 15:10:13 +02:00
Vladimir Blagojevic
a766b70a8f
Tutorial 18:Open in Colab doesn't work in Firefox ( #2767 )
...
* Tutorial 18:Open in Colab doesn't work in Firefox
* Tutorial 18:Open in Colab doesn't work in Firefox v2
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-06 10:51:09 -04:00
Tuana Celik
917afb1530
Trying out some smaller images for docs ( #2772 )
2022-07-06 16:11:23 +02:00
tstadel
e9219f4dc2
Fix confusing elasticsearch exception ( #2763 )
...
* convert confusing exception to warning and add no docs case.
* blacken
* fix test
2022-07-06 15:40:51 +02:00
Vladimir Blagojevic
a2905d05f7
Bump version to next release candidate ( #2765 )
...
* Bump version to next release candidate
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-06 11:26:42 +02:00
Vladimir Blagojevic
c80336c424
Upgrade to v1.6.0 and copy docs folder ( #2764 )
...
* Upgrade to v1.6.0 and copy docs folder
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
v1.6.0
2022-07-06 10:25:15 +02:00
tstadel
2a7c0139f5
double max heap size for elasticsearch in CI ( #2756 )
2022-07-05 13:53:32 +02:00
bogdankostic
353da8b1c1
Add Tutorials 16, 17 and 18 to README ( #2758 )
2022-07-05 12:04:58 +02:00
Julian Risch
f70f4e90fd
correct docstring parameter name ( #2757 )
2022-07-05 12:00:40 +02:00
Patrick Deutschmann
1db3fd0942
Add support for Multi-Hop Dense Retrieval ( #2571 )
...
* Implement MDR
* Adapt conftest to new MDR signature
* Update Documentation & Code Style
* Change signature of queries param in batch methods of MDR like in #2575
* Update Documentation & Code Style
* Rename MultihopDenseRetriever to MultihopEmbeddingRetriever
* Fix filters in retrieve_batch
* Add docstring for MultihopEmbeddingRetriever.__init__
* Update Documentation & Code Style
* Revert forward signature of TextSimilarityHead
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-05 11:31:11 +02:00
bogdankostic
dc48c444d4
Fix loading of tokenizers in DPR ( #2755 )
2022-07-04 18:18:14 +02:00
Tuana Celik
2a8b129bae
first version of save_to_remote for HF from FarmReader ( #2618 )
...
* first version of save_to_remote for HF from FarmReader
* Update Documentation & Code Style
* Changes based on comments
* Update Documentation & Code Style
* imports order
* making small changes to pydoc
* indent fix
* Update Documentation & Code Style
* keyword arguments instead of positional
* Changing to repo_id
huggingface-hub package would have to be v0.5 or higher - checking how to handle with Thomas
* Update Documentation & Code Style
* adding huggingface-hub dependency 0.5 or above
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-07-04 15:39:56 +02:00
Julian Risch
f7d00476f9
Reduce logging messages and simplify logging ( #2682 )
...
* change log levels to debug and use torch.div
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-04 14:02:55 +02:00
tstadel
322d964679
Remove rapidfuzz version pin ( #2730 )
...
* remove rapidfuzz version pin
* exclude malicious version 2.0.14
* update rapidfuzz version restrictions
2022-07-04 13:53:39 +02:00
Francesco Castelli
31dcd55c24
Validate max_seq_length in SquadProcessor ( #2740 )
...
* added max_len_seq validation in SquadProcessor
* fixed string formatting
* added tests for invalid max_seq_len
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-04 13:35:45 +02:00
Vladimir Blagojevic
ffb7e4e4bd
GPL tutorial - add GPU header and open in colab button ( #2736 )
...
* GPL tutorial - add GPU header and open in colab button
* Add GPL tutorial to run exclusion list
2022-07-04 05:23:39 -04:00
Julian Risch
1c1faa4742
Make check of document & embedding count optional in FAISS and Pinecone ( #2677 )
...
* make validation optional & add method call in pinecone init
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-04 10:12:31 +02:00
Julian Risch
1781e88802
Upgrade torch to 1.12 ( #2741 )
...
* Upgrade torch to 1.12
* upgrade torch-scatter
* add explicit torch-scatter installation
* set torch dependency to range >1.9,<1.13
2022-07-01 20:23:32 +02:00
Daniel Augustus Bichuetti Silva
e3b2ee956a
Improved crawler support for dynamically loaded pages ( #2710 )
...
* Improved crawler support for dynamically loaded pages
* Reduced scope of StaleElementReferenceException and removed deprecated code from WebDriver initialization
* Improvements on crawler testing code
* Code format and style applied on f028331948c170448613e86dfdfa222f7c2043fd
* Update Documentation & Code Style
* Remove unused imports/parameters
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-01 10:47:33 +02:00
Massimiliano Pippi
1e01cd0efb
pin es client to include bugfixes ( #2735 )
2022-06-27 15:13:34 +02:00
mathislucka
8d65bc5f9b
Update document scores based on ranker node ( #2048 )
...
* ranker should return scores for later usage
* fix wrong tuple order
* adjust ranker scores; add tests
* Update Documentation & Code Style
* fix mypy
* Update Documentation & Code Style
* fix mypy
* Update Documentation & Code Style
* relax ranker test tolerance
* update ranker test score
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2022-06-27 12:17:18 +02:00
Julian Risch
46c9c8c562
Upgrade transformers to 4.20.1 ( #2702 )
...
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
2022-06-27 11:56:58 +02:00
Vladimir Blagojevic
b08c5f81d1
Add GPL adaptation tutorial ( #2632 )
...
* Add GPL adaptation tutorial
* Latest round of Aga's corrections
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-26 02:44:57 -04:00
Sara Zan
426f49979b
Change repo with repository in python_cache ( #2731 )
...
* Change repo with repository
* remove name
* using owner and name
* use owner name
* replace name with login
* Trying with the PR context instead
2022-06-24 18:36:19 +02:00
Sara Zan
6a7152044e
add repo name as well ( #2729 )
2022-06-24 17:08:28 +02:00
Stefano Fiorucci
42b1a5c3a4
fix error in log message ( #2719 )
...
* fix error in log message
* Update Documentation & Code Style
* pass index to _drop_duplicate_documents
* make the use of index in logging more explicit
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-24 16:53:52 +02:00
Sara Zan
13514f960d
Speficy ref in action ( #2727 )
2022-06-24 15:56:17 +02:00
Massimiliano Pippi
3207f372ee
Fix bugs in loading code from yaml ( #2705 )
...
* fix bug in loading code from yaml
2022-06-24 14:52:13 +02:00
tstadel
ab443aab28
Fix match_context tests in test_utils.py ( #2725 )
...
* fix match_context tests
* fix naming of test
* pin rapidfuzz to 2.0.13
2022-06-24 13:23:00 +02:00
Sara Zan
e8546e2124
Replace deprecated Selenium methods ( #2724 )
...
* Fix crawler.py
* Fix test_connector.py
* unused import
Co-authored-by: danielbichuetti <daniel.bichuetti@gmail.com>
2022-06-24 12:05:32 +02:00
Sara Zan
400d2cdf77
Fix audio tests on CI ( #2718 )
...
* Update Documentation & Code Style
* fix huggingface-hub version
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-24 11:36:31 +02:00
tstadel
1168f6365d
Fix using id_hash_keys as pipeline params ( #2717 )
...
* Fix using id_hash_keys as pipeline params
* Update Documentation & Code Style
* add tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-24 09:55:09 +02:00
tstadel
a084a982c4
Show warning in reader.eval() about differences compared to pipeline.eval() ( #2477 )
...
* deprecate reader.eval
* Update Documentation & Code Style
* update warning to describe differences between pipeline.eval()
* remove empty lines
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-23 18:40:17 +02:00
Sara Zan
e69492a28f
Tutorial 14 doc changes ( #2714 )
...
* let the bot apply changes in this pr
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-23 12:36:12 +02:00
Stefano Fiorucci
b01a7c2259
Add InMemoryKnowledgeGraph ( #2678 )
...
* draft for InMemoryKnowledgeGraph
* remove comments
* Update Documentation & Code Style
* fix import and signature
* Fix dependencies for in_memory_knowlede_graph
* updated tutorials
* Update Documentation & Code Style
* fix bug in notebook
* fix other notebook bug
* Update Documentation & Code Style
* improved tutorial notebook
* Update Documentation & Code Style
* better implementation of InMemoryKnowledgeGraph
* fix
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-22 19:16:33 +02:00
Rob Pasternak
b87c0c950b
Tutorial 14 edit ( #2663 )
...
* Rewrite Tutorial 14 for increased user-friendliness
* Update Tutorial14 .py file to match .ipynb file
* Update Documentation & Code Style
* unblock the ci
* ignore error in jitterbit/get-changed-files
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
2022-06-22 13:03:07 +02:00
Julian Risch
325bc5466a
Revert "Upgrade transformers to 4.20.0 ( #2694 )" ( #2700 )
...
This reverts commit 4a63707f1a177123c13929eb316d3ecaa7fd6c5f.
2022-06-21 21:17:21 +02:00
Julian Risch
4a63707f1a
Upgrade transformers to 4.20.0 ( #2694 )
2022-06-21 17:23:31 +02:00