mirror of
https://github.com/datahub-project/datahub.git
synced 2025-09-14 19:51:39 +00:00
75 lines
4.5 KiB
Markdown
75 lines
4.5 KiB
Markdown
![]() |
# Nifi
|
|||
|
|
|||
|
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
|
|||
|
|
|||
|
## Setup
|
|||
|
|
|||
|
To install this plugin, run `pip install 'acryl-datahub[nifi]'`.
|
|||
|
|
|||
|
## Capabilities
|
|||
|
|
|||
|
This plugin extracts the following:
|
|||
|
|
|||
|
- Nifi flow as `DataFlow` entity
|
|||
|
- Ingress, egress processors, remote input and output ports as `DataJob` entity
|
|||
|
- Input and output ports receiving remote connections as `Dataset` entity
|
|||
|
- Lineage information between external datasets and ingress/egress processors by analyzing provenance events
|
|||
|
|
|||
|
Current limitations:
|
|||
|
|
|||
|
- Limited ingress/egress processors are supported
|
|||
|
- S3: `ListS3`, `FetchS3Object`, `PutS3Object`
|
|||
|
- SFTP: `ListSFTP`, `FetchSFTP`, `GetSFTP`, `PutSFTP`
|
|||
|
|
|||
|
## Quickstart recipe
|
|||
|
|
|||
|
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
|
|||
|
|
|||
|
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
|
|||
|
|
|||
|
```yml
|
|||
|
source:
|
|||
|
type: "nifi"
|
|||
|
config:
|
|||
|
# Coordinates
|
|||
|
site_url: "https://localhost:8443/nifi/"
|
|||
|
|
|||
|
# Credentials
|
|||
|
auth: SINGLE_USER
|
|||
|
username: admin
|
|||
|
password: password
|
|||
|
|
|||
|
sink:
|
|||
|
# sink configs
|
|||
|
```
|
|||
|
|
|||
|
## Config details
|
|||
|
|
|||
|
Note that a `.` is used to denote nested fields in the YAML recipe.
|
|||
|
|
|||
|
| Field | Required | Default | Description |
|
|||
|
| -------------------------- | -------- | -------------------------- | ------------------------------------------------------- |
|
|||
|
| `site_url` | ✅ | `"https://localhost:8443/nifi/"` | URI to connect to. |
|
|||
|
| `site_name` | | `"default"` | Site name to identify this site with, useful when using input and output ports receiving remote connections |
|
|||
|
| `auth` | | `"NO_AUTH"` | Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT |
|
|||
|
| `username` | | | Nifi username, must be set for `auth` = `"SINGLE_USER"` |
|
|||
|
| `password` | | | Nifi password, must be set for `auth` = `"SINGLE_USER"` |
|
|||
|
| `client_cert_file` | | | Path to PEM file containing the public certificates for the user/client identity, must be set for `auth` = `"CLIENT_CERT"` |
|
|||
|
| `client_key_file` | | | Path to PEM file containing the client’s secret key |
|
|||
|
| `client_key_password` | | | The password to decrypt the client_key_file |
|
|||
|
| `ca_file` | | | Path to PEM file containing certs for the root CA(s) for the NiFi |
|
|||
|
| `provenance_days` | | | time window to analyze provenance events for external datasets |
|
|||
|
| `site_url_to_site_name` | | | Lookup to find site_name for site_url, required if using remote process groups in nifi flow |
|
|||
|
|`process_group_pattern.allow`| | | List of regex patterns for process groups to include in ingestion. |
|
|||
|
| `process_group_pattern.deny`| | | List of regex patterns for process groups to exclude from ingestion. |
|
|||
|
| `process_group_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
|
|||
|
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
|
|||
|
|
|||
|
## Compatibility
|
|||
|
|
|||
|
Coming soon!
|
|||
|
|
|||
|
## Questions
|
|||
|
|
|||
|
If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
|