75 lines
4.4 KiB
Markdown
Raw Normal View History

2021-12-09 04:26:31 +05:30
# Nifi
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
## Setup
To install this plugin, run `pip install 'acryl-datahub[nifi]'`.
## Capabilities
This plugin extracts the following:
- Nifi flow as `DataFlow` entity
- Ingress, egress processors, remote input and output ports as `DataJob` entity
- Input and output ports receiving remote connections as `Dataset` entity
- Lineage information between external datasets and ingress/egress processors by analyzing provenance events
Current limitations:
- Limited ingress/egress processors are supported
- S3: `ListS3`, `FetchS3Object`, `PutS3Object`
- SFTP: `ListSFTP`, `FetchSFTP`, `GetSFTP`, `PutSFTP`
## Quickstart recipe
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
```yml
source:
type: "nifi"
config:
# Coordinates
site_url: "https://localhost:8443/nifi/"
# Credentials
auth: SINGLE_USER
username: admin
password: password
sink:
# sink configs
```
## Config details
Note that a `.` is used to denote nested fields in the YAML recipe.
| Field | Required | Default | Description |
| -------------------------- | -------- | -------------------------- | ------------------------------------------------------- |
| `site_url` | ✅ | | URI to connect |
2021-12-09 04:26:31 +05:30
| `site_name` | | `"default"` | Site name to identify this site with, useful when using input and output ports receiving remote connections |
| `auth` | | `"NO_AUTH"` | Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT |
| `username` | | | Nifi username, must be set for `auth` = `"SINGLE_USER"` |
| `password` | | | Nifi password, must be set for `auth` = `"SINGLE_USER"` |
| `client_cert_file` | | | Path to PEM file containing the public certificates for the user/client identity, must be set for `auth` = `"CLIENT_CERT"` |
| `client_key_file` | | | Path to PEM file containing the clients secret key |
| `client_key_password` | | | The password to decrypt the client_key_file |
| `ca_file` | | | Path to PEM file containing certs for the root CA(s) for the NiFi |
| `provenance_days` | | | time window to analyze provenance events for external datasets |
| `site_url_to_site_name` | | | Lookup to find site_name for site_url, required if using remote process groups in nifi flow |
|`process_group_pattern.allow`| | | List of regex patterns for process groups to include in ingestion. |
| `process_group_pattern.deny`| | | List of regex patterns for process groups to exclude from ingestion. |
| `process_group_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
## Compatibility
Coming soon!
## Questions
If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!