mirror of
https://github.com/datahub-project/datahub.git
synced 2025-09-12 02:30:54 +00:00
4.4 KiB
4.4 KiB
Nifi
For context on getting started with ingestion, check out our metadata ingestion guide.
Setup
To install this plugin, run pip install 'acryl-datahub[nifi]'
.
Capabilities
This plugin extracts the following:
- Nifi flow as
DataFlow
entity - Ingress, egress processors, remote input and output ports as
DataJob
entity - Input and output ports receiving remote connections as
Dataset
entity - Lineage information between external datasets and ingress/egress processors by analyzing provenance events
Current limitations:
- Limited ingress/egress processors are supported
- S3:
ListS3
,FetchS3Object
,PutS3Object
- SFTP:
ListSFTP
,FetchSFTP
,GetSFTP
,PutSFTP
- S3:
Quickstart recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: "nifi"
config:
# Coordinates
site_url: "https://localhost:8443/nifi/"
# Credentials
auth: SINGLE_USER
username: admin
password: password
sink:
# sink configs
Config details
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Required | Default | Description |
---|---|---|---|
site_url |
✅ | URI to connect | |
site_name |
"default" |
Site name to identify this site with, useful when using input and output ports receiving remote connections | |
auth |
"NO_AUTH" |
Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT | |
username |
Nifi username, must be set for auth = "SINGLE_USER" |
||
password |
Nifi password, must be set for auth = "SINGLE_USER" |
||
client_cert_file |
Path to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT" |
||
client_key_file |
Path to PEM file containing the client’s secret key | ||
client_key_password |
The password to decrypt the client_key_file | ||
ca_file |
Path to PEM file containing certs for the root CA(s) for the NiFi | ||
provenance_days |
time window to analyze provenance events for external datasets | ||
site_url_to_site_name |
Lookup to find site_name for site_url, required if using remote process groups in nifi flow | ||
process_group_pattern.allow |
List of regex patterns for process groups to include in ingestion. | ||
process_group_pattern.deny |
List of regex patterns for process groups to exclude from ingestion. | ||
process_group_pattern.ignoreCase |
True |
Whether to ignore case sensitivity during pattern matching. | |
env |
"PROD" |
Environment to use in namespace when constructing URNs. |
Compatibility
Coming soon!
Questions
If you've got any questions on configuring this source, feel free to ping us on our Slack!