75 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Nifi
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
## Setup
To install this plugin, run `pip install 'acryl-datahub[nifi]'`.
## Capabilities
This plugin extracts the following:
- Nifi flow as `DataFlow` entity
- Ingress, egress processors, remote input and output ports as `DataJob` entity
- Input and output ports receiving remote connections as `Dataset` entity
- Lineage information between external datasets and ingress/egress processors by analyzing provenance events
Current limitations:
- Limited ingress/egress processors are supported
- S3: `ListS3`, `FetchS3Object`, `PutS3Object`
- SFTP: `ListSFTP`, `FetchSFTP`, `GetSFTP`, `PutSFTP`
## Quickstart recipe
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
```yml
source:
type: "nifi"
config:
# Coordinates
site_url: "https://localhost:8443/nifi/"
# Credentials
auth: SINGLE_USER
username: admin
password: password
sink:
# sink configs
```
## Config details
Note that a `.` is used to denote nested fields in the YAML recipe.
| Field | Required | Default | Description |
| -------------------------- | -------- | -------------------------- | ------------------------------------------------------- |
| `site_url` | ✅ | | URI to connect |
| `site_name` | | `"default"` | Site name to identify this site with, useful when using input and output ports receiving remote connections |
| `auth` | | `"NO_AUTH"` | Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT |
| `username` | | | Nifi username, must be set for `auth` = `"SINGLE_USER"` |
| `password` | | | Nifi password, must be set for `auth` = `"SINGLE_USER"` |
| `client_cert_file` | | | Path to PEM file containing the public certificates for the user/client identity, must be set for `auth` = `"CLIENT_CERT"` |
| `client_key_file` | | | Path to PEM file containing the clients secret key |
| `client_key_password` | | | The password to decrypt the client_key_file |
| `ca_file` | | | Path to PEM file containing certs for the root CA(s) for the NiFi |
| `provenance_days` | | | time window to analyze provenance events for external datasets |
| `site_url_to_site_name` | | | Lookup to find site_name for site_url, required if using remote process groups in nifi flow |
|`process_group_pattern.allow`| | | List of regex patterns for process groups to include in ingestion. |
| `process_group_pattern.deny`| | | List of regex patterns for process groups to exclude from ingestion. |
| `process_group_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
## Compatibility
Coming soon!
## Questions
If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!