2022-02-04 15:45:09 +01:00
|
|
|
<a id="file_type"></a>
|
|
|
|
|
2021-12-08 18:19:03 +01:00
|
|
|
# Module file\_type
|
|
|
|
|
2022-02-04 15:45:09 +01:00
|
|
|
<a id="file_type.FileTypeClassifier"></a>
|
|
|
|
|
2021-12-08 18:19:03 +01:00
|
|
|
## FileTypeClassifier
|
|
|
|
|
|
|
|
```python
|
|
|
|
class FileTypeClassifier(BaseComponent)
|
|
|
|
```
|
|
|
|
|
|
|
|
Route files in an Indexing Pipeline to corresponding file converters.
|
|
|
|
|
2022-03-10 15:01:05 +01:00
|
|
|
<a id="file_type.FileTypeClassifier.__init__"></a>
|
|
|
|
|
2022-05-06 16:00:08 +02:00
|
|
|
#### FileTypeClassifier.\_\_init\_\_
|
2022-03-10 15:01:05 +01:00
|
|
|
|
|
|
|
```python
|
|
|
|
def __init__(supported_types: List[str] = DEFAULT_TYPES)
|
|
|
|
```
|
|
|
|
|
|
|
|
Node that sends out files on a different output edge depending on their extension.
|
|
|
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
|
|
- `supported_types`: the file types that this node can distinguish.
|
|
|
|
Note that it's limited to a maximum of 10 outgoing edges, which
|
|
|
|
correspond each to a file extension. Such extension are, by default
|
|
|
|
`txt`, `pdf`, `md`, `docx`, `html`. Lists containing more than 10
|
|
|
|
elements will not be allowed. Lists with duplicate elements will
|
|
|
|
also be rejected.
|
|
|
|
|
2022-02-04 15:45:09 +01:00
|
|
|
<a id="file_type.FileTypeClassifier.run"></a>
|
2022-02-03 13:43:18 +01:00
|
|
|
|
2022-05-06 16:00:08 +02:00
|
|
|
#### FileTypeClassifier.run
|
2021-12-08 18:19:03 +01:00
|
|
|
|
|
|
|
```python
|
2022-02-04 15:45:09 +01:00
|
|
|
def run(file_paths: Union[Path, List[Path], str, List[str], List[Union[Path, str]]])
|
2021-12-08 18:19:03 +01:00
|
|
|
```
|
|
|
|
|
2022-02-03 13:43:18 +01:00
|
|
|
Sends out files on a different output edge depending on their extension.
|
|
|
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
|
|
- `file_paths`: paths to route on different edges.
|
2021-12-08 18:19:03 +01:00
|
|
|
|