mirror of
https://github.com/datahub-project/datahub.git
synced 2025-09-12 10:41:31 +00:00
75 lines
4.4 KiB
Markdown
75 lines
4.4 KiB
Markdown
# Nifi
|
||
|
||
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
|
||
|
||
## Setup
|
||
|
||
To install this plugin, run `pip install 'acryl-datahub[nifi]'`.
|
||
|
||
## Capabilities
|
||
|
||
This plugin extracts the following:
|
||
|
||
- Nifi flow as `DataFlow` entity
|
||
- Ingress, egress processors, remote input and output ports as `DataJob` entity
|
||
- Input and output ports receiving remote connections as `Dataset` entity
|
||
- Lineage information between external datasets and ingress/egress processors by analyzing provenance events
|
||
|
||
Current limitations:
|
||
|
||
- Limited ingress/egress processors are supported
|
||
- S3: `ListS3`, `FetchS3Object`, `PutS3Object`
|
||
- SFTP: `ListSFTP`, `FetchSFTP`, `GetSFTP`, `PutSFTP`
|
||
|
||
## Quickstart recipe
|
||
|
||
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
|
||
|
||
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
|
||
|
||
```yml
|
||
source:
|
||
type: "nifi"
|
||
config:
|
||
# Coordinates
|
||
site_url: "https://localhost:8443/nifi/"
|
||
|
||
# Credentials
|
||
auth: SINGLE_USER
|
||
username: admin
|
||
password: password
|
||
|
||
sink:
|
||
# sink configs
|
||
```
|
||
|
||
## Config details
|
||
|
||
Note that a `.` is used to denote nested fields in the YAML recipe.
|
||
|
||
| Field | Required | Default | Description |
|
||
| -------------------------- | -------- | -------------------------- | ------------------------------------------------------- |
|
||
| `site_url` | ✅ | | URI to connect |
|
||
| `site_name` | | `"default"` | Site name to identify this site with, useful when using input and output ports receiving remote connections |
|
||
| `auth` | | `"NO_AUTH"` | Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT |
|
||
| `username` | | | Nifi username, must be set for `auth` = `"SINGLE_USER"` |
|
||
| `password` | | | Nifi password, must be set for `auth` = `"SINGLE_USER"` |
|
||
| `client_cert_file` | | | Path to PEM file containing the public certificates for the user/client identity, must be set for `auth` = `"CLIENT_CERT"` |
|
||
| `client_key_file` | | | Path to PEM file containing the client’s secret key |
|
||
| `client_key_password` | | | The password to decrypt the client_key_file |
|
||
| `ca_file` | | | Path to PEM file containing certs for the root CA(s) for the NiFi |
|
||
| `provenance_days` | | | time window to analyze provenance events for external datasets |
|
||
| `site_url_to_site_name` | | | Lookup to find site_name for site_url, required if using remote process groups in nifi flow |
|
||
|`process_group_pattern.allow`| | | List of regex patterns for process groups to include in ingestion. |
|
||
| `process_group_pattern.deny`| | | List of regex patterns for process groups to exclude from ingestion. |
|
||
| `process_group_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
|
||
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
|
||
|
||
## Compatibility
|
||
|
||
Coming soon!
|
||
|
||
## Questions
|
||
|
||
If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
|