mirror of
https://github.com/deepset-ai/haystack.git
synced 2026-01-06 12:07:04 +00:00
add DocumentLanguageClassifier API (#4401)
This commit is contained in:
parent
98256ecf57
commit
7d17ca7391
26
docs/pydoc/config/doc-language-classifier.yml
Normal file
26
docs/pydoc/config/doc-language-classifier.yml
Normal file
@ -0,0 +1,26 @@
|
||||
loaders:
|
||||
- type: python
|
||||
search_path: [../../../haystack/nodes/doc_language_classifier]
|
||||
modules: ["base", "langdetect", "transformers"]
|
||||
ignore_when_discovered: ["__init__"]
|
||||
processors:
|
||||
- type: filter
|
||||
expression:
|
||||
documented_only: true
|
||||
do_not_filter_modules: false
|
||||
skip_empty_modules: true
|
||||
- type: smart
|
||||
- type: crossref
|
||||
renderer:
|
||||
type: renderers.ReadmeRenderer
|
||||
excerpt: Detects the language of the Documents
|
||||
category_slug: haystack-classes
|
||||
title: Document Language Classifier API
|
||||
slug: doc-language-classifier-api
|
||||
order: 25
|
||||
markdown:
|
||||
descriptive_class_title: false
|
||||
descriptive_module_title: true
|
||||
add_method_class_prefix: true
|
||||
add_member_class_prefix: false
|
||||
filename: doc_language_classifier_api.md
|
||||
@ -12,7 +12,7 @@ logger = logging.getLogger(__name__)
|
||||
class LangdetectDocumentLanguageClassifier(BaseDocumentLanguageClassifier):
|
||||
"""
|
||||
Node based on the lightweight and fast [langdetect library](https://github.com/Mimino666/langdetect) for document language classification.
|
||||
This node detects the languge of Documents and adds the output to the Documents metadata.
|
||||
This node detects the language of Documents and adds the output to the Documents metadata.
|
||||
The meta field of the Document is a dictionary with the following format:
|
||||
``'meta': {'name': '450_Baelor.txt', 'language': 'en'}``
|
||||
- Using the document language classifier, you can directly get predictions via predict()
|
||||
|
||||
@ -18,7 +18,7 @@ class TransformersDocumentLanguageClassifier(BaseDocumentLanguageClassifier):
|
||||
Transformer based model for document language classification using the HuggingFace's transformers framework
|
||||
(https://github.com/huggingface/transformers).
|
||||
While the underlying model can vary (BERT, Roberta, DistilBERT ...), the interface remains the same.
|
||||
This node detects the languge of Documents and adds the output to the Documents metadata.
|
||||
This node detects the language of Documents and adds the output to the Documents metadata.
|
||||
The meta field of the Document is a dictionary with the following format:
|
||||
``'meta': {'name': '450_Baelor.txt', 'language': 'en'}``
|
||||
- Using the document language classifier, you can directly get predictions via predict()
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user