haystack/docs/_src/api/api/extractor.md
Branden Chan 81f82b1b95
Update API Reference Pages for v1.0 (#1729)
* Create new API pages and update existing ones

* Create query classifier page

* Remove Objects suffix
2021-11-11 12:44:29 +01:00

54 lines
1.3 KiB
Markdown

<a name="entity"></a>
# Module entity
<a name="entity.EntityExtractor"></a>
## EntityExtractor
```python
class EntityExtractor(BaseComponent)
```
This node is used to extract entities out of documents.
The most common use case for this would be as a named entity extractor.
The default model used is dslim/bert-base-NER.
This node can be placed in a querying pipeline to perform entity extraction on retrieved documents only,
or it can be placed in an indexing pipeline so that all documents in the document store have extracted entities.
The entities extracted by this Node will populate Document.entities
<a name="entity.EntityExtractor.run"></a>
#### run
```python
| run(documents: Optional[Union[List[Document], List[dict]]] = None) -> Tuple[Dict, str]
```
This is the method called when this node is used in a pipeline
<a name="entity.EntityExtractor.extract"></a>
#### extract
```python
| extract(text)
```
This function can be called to perform entity extraction when using the node in isolation.
<a name="entity.simplify_ner_for_qa"></a>
#### simplify\_ner\_for\_qa
```python
simplify_ner_for_qa(output)
```
Returns a simplified version of the output dictionary
with the following structure:
[
{
answer: { ... }
entities: [ { ... }, {} ]
}
]
The entities included are only the ones that overlap with
the answer itself.