Update docstring in DPR for embed_title (#459)

This commit is contained in:
Malte Pietsch 2020-10-02 13:41:33 +02:00 committed by GitHub
parent 9b58374b7c
commit 029d1b75f2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -50,7 +50,12 @@ class DensePassageRetriever(BaseRetriever):
:param max_seq_len: Longest length of each sequence
:param use_gpu: Whether to use gpu or not
:param batch_size: Number of questions or passages to encode at once
:param embed_title: Whether to concatenate title and passage to a text pair that is then used to create the embedding
:param embed_title: Whether to concatenate title and passage to a text pair that is then used to create the embedding.
This is the approach used in the original paper and is likely to improve performance if your
titles contain meaningful information for retrieval (topic, entities etc.) .
The title is expected to be present in doc.meta["name"] and can be supplied in the documents
before writing them to the DocumentStore like this:
{"text": "my text", "meta": {"name": "my title"}}.
:param remove_sep_tok_from_untitled_passages: If embed_title is ``True``, there are different strategies to deal with documents that don't have a title.
If this param is ``True`` => Embed passage as single text, similar to embed_title = False (i.e [CLS] passage_tok1 ... [SEP]).
If this param is ``False`` => Embed passage as text pair with empty title (i.e. [CLS] [SEP] passage_tok1 ... [SEP])