diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/database/hive/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/database/hive/index.md index 78e95c01005..40f9f751fe1 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/database/hive/index.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/database/hive/index.md @@ -60,9 +60,13 @@ the following docs to connect using Airflow SDK or with the CLI. To deploy OpenMetadata, check the Deployment guides. {%/inlineCallout%} -To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with -custom Airflow plugins to handle the workflow deployment. +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. +### Metadata +To extract metadata, the user used in the connection needs to be able to perform `SELECT`, `SHOW`, and `DESCRIBE` operations in the database/schema where the metadata needs to be extracted from. + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](/connectors/ingestion/workflows/profiler) and data quality tests [here](/connectors/ingestion/workflows/data-quality). ## Metadata Ingestion diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/database/mariadb/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/database/mariadb/index.md index 7bbcc717369..3210cab8c70 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/database/mariadb/index.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/database/mariadb/index.md @@ -60,8 +60,27 @@ the following docs to connect using Airflow SDK or with the CLI. To deploy OpenMetadata, check the Deployment guides. {%/inlineCallout%} -To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with -custom Airflow plugins to handle the workflow deployment. +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. + +### Metadata +To extract metadata the user used in the connection needs to have access to the `INFORMATION_SCHEMA`. By default a user can see only the rows in the `INFORMATION_SCHEMA` that correspond to objects for which the user has the proper access privileges. + +```SQL +-- Create user. More details https://mariadb.com/kb/en/create-user/ +CREATE USER [@] IDENTIFIED BY ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a specific object +GRANT SELECT ON world.hello TO ''; +``` + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](/connectors/ingestion/workflows/profiler) and data quality tests [here](/connectors/ingestion/workflows/data-quality). ## Metadata Ingestion diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/database/mysql/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/database/mysql/index.md index ef5b3d333f0..420a1f0fb48 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/database/mysql/index.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/database/mysql/index.md @@ -63,7 +63,26 @@ To deploy OpenMetadata, check the Deployment guides. To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. -Note that We support MySQL (version 8.0.0 or greater) and the user should have access to the `INFORMATION_SCHEMA` table. +### Metadata +Note that We support MySQL (version 8.0.0 or greater) and the user should have access to the `INFORMATION_SCHEMA` table. By default a user can see only the rows in the `INFORMATION_SCHEMA` that correspond to objects for which the user has the proper access privileges. + +```SQL +-- Create user. If is ommited, defaults to '%' +-- More details https://dev.mysql.com/doc/refman/8.0/en/create-user.html +CREATE USER ''[@''] IDENTIFIED BY ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a specific object +GRANT SELECT ON world.hello TO ''; +``` + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](/connectors/ingestion/workflows/profiler) and data quality tests [here](/connectors/ingestion/workflows/data-quality). ## Metadata Ingestion diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/database/presto/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/database/presto/index.md index d8fef1a8997..b8d16e70f4b 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/database/presto/index.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/database/presto/index.md @@ -60,8 +60,14 @@ the following docs to connect using Airflow SDK or with the CLI. To deploy OpenMetadata, check the Deployment guides. {%/inlineCallout%} -To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with -custom Airflow plugins to handle the workflow deployment. +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. + +### Metadata +To extract metadata, the user needs to be able to perform `SHOW CATALOGS`, `SHOW TABLES`, and `SHOW COLUMNS FROM` on the catalogs/tables you wish to extract metadata from and have `SELECT` permission on the `INFORMATION_SCHEMA`. Access to resources will be different based on the connector used. You can find more details in the Presto documentation website [here](https://prestodb.io/docs/current/connector.html). You can also get more information regarding system access control in Presto [here](https://prestodb.io/docs/current/security/built-in-system-access-control.html). + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). + ## Metadata Ingestion diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/database/redshift/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/database/redshift/index.md index e889584c6a5..4b569c1aa74 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/database/redshift/index.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/database/redshift/index.md @@ -62,17 +62,26 @@ the following docs to connect using Airflow SDK or with the CLI. To deploy OpenMetadata, check the Deployment guides. {%/inlineCallout%} -To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with -custom Airflow plugins to handle the workflow deployment. +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. +### Metadata Redshift user must grant `SELECT` privilege on table [SVV_TABLE_INFO](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) to fetch the metadata of tables and views. For more information visit [here](https://docs.aws.amazon.com/redshift/latest/dg/c_visibility-of-data.html). ```sql - +-- Create a new user +-- More details https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_USER.html CREATE USER test_user with PASSWORD 'password'; -GRANT SELECT ON TABLE svv_table_info to test_user; +-- Grant SELECT on table +GRANT SELECT ON TABLE svv_table_info to test_user; ``` + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). + +### Usage & Linegae +For the usage and lineage workflow, the user will need `SELECT` privilege on `STL_QUERY` table. You can find more information on the usage workflow [here](https://docs.open-metadata.org/connectors/ingestion/workflows/usage) and the lineage workflow [here](https://docs.open-metadata.org/connectors/ingestion/workflows/lineage). + ## Metadata Ingestion {% stepsContainer %} diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/database/singlestore/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/database/singlestore/index.md index 6345649dd18..53bc1c7e37e 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/database/singlestore/index.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/database/singlestore/index.md @@ -63,6 +63,27 @@ To deploy OpenMetadata, check the Deployment guides. To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. +### Metadata +To extract metadata the user used in the connection needs to have access to the `INFORMATION_SCHEMA`. By default a user can see only the rows in the `INFORMATION_SCHEMA` that correspond to objects for which the user has the proper access privileges. + +```SQL +-- Create user. +-- More details https://docs.singlestore.com/managed-service/en/reference/sql-reference/security-management-commands/create-user.html +CREATE USER [@] IDENTIFIED BY ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a specific object +GRANT SELECT ON world.hello TO ''; +``` + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](/connectors/ingestion/workflows/profiler) and data quality tests [here](/connectors/ingestion/workflows/data-quality). + ## Metadata Ingestion {% stepsContainer %} diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/database/trino/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/database/trino/index.md index b537670325c..e46c3264dcd 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/database/trino/index.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/database/trino/index.md @@ -60,18 +60,23 @@ the following docs to connect using Airflow SDK or with the CLI. To deploy OpenMetadata, check the Deployment guides. {%/inlineCallout%} -To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with -custom Airflow plugins to handle the workflow deployment. +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. {% tilesContainer %} -To ingest metadata from the Trino source, the user must have select privileges for the following tables. +### Metadata +To extract metadata, the user needs to be able to have `SELECT` permission to the following tables: - `information_schema.schemata` - `information_schema.columns` - `information_schema.tables` - `information_schema.views` - `system.metadata.table_comments` +Access to resources will be based on the user access permission to access specific data sources. More information regarding access and security can be found in the Trino documentation [here](https://trino.io/docs/current/security.html). + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). + {% /tilesContainer %} diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/ingestion/workflows/data-quality/tests.md b/openmetadata-docs-v1/content/v1.0.0/connectors/ingestion/workflows/data-quality/tests.md index 82dce4b7170..b8b17fbadf3 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/ingestion/workflows/data-quality/tests.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/ingestion/workflows/data-quality/tests.md @@ -20,6 +20,7 @@ Tests applied on top of a Table. Here is the list of all table tests: - [Table Column Name to Exist](#table-column-name-to-exist) - [Table Column to Match Set](#table-column-to-match-set) - [Table Custom SQL Test](#table-custom-sql-test) +- [Table Row Inserted Count To Be Between](#table-row-inserted-count-to-be-between) ### Table Row Count to Equal Validate the total row count in the table is equal to the given value. @@ -326,6 +327,71 @@ parameterValues: } ``` +### Table Row Inserted Count To Be Between +Validate the number of rows inserted for the defined period is between the expected range + +**Properties** + +* `Min Row Count`: Lower bound +* `Max Row Count`: Upper bound +* `Column Name`: The name of the column used to apply the range filter +* `Range Type`: One of `HOUR`, `DAY`, `MONTH`, `YEAR` +* `Interval`: The range interval (e.g. 1,2,3,4,5, etc) + +**Behavior** + +| Condition | Status | +| ----------- | ----------- | +|Number of rows **is between** `Min Row Count` and `Max Row Count`| Success ✅| +|Number of rows **is not between** `Min Row Count` and `Max Row Count|Failed ❌| + +**YAML Config** + +```yaml +testDefinitionName: tableRowInsertedCountToBeBetween +parameterValues: + - name: min + value: 10 + - name: max + value: 100 + - name: columnName + value: colA + - name: rangeType + value: DAY + - name: rangeInterval + value: 1 +``` + +**JSON Config** + +```json +{ + "testDefinitionName": "tableRowInsertedCountToBeBetween", + "parameterValues": [ + { + "name": "min", + "value": 10 + }, + { + "name": "max", + "value": 100 + }, + { + "name": "columnName", + "value": "colA" + }, + { + "name": "rangeType", + "value": "DAY" + }, + { + "name": "rangeInterval", + "value": 1 + } + ] +} +``` + ## Column Tests Tests applied on top of Column metrics. Here is the list of all column tests: - [Column Values to Be Unique](#column-values-to-be-unique) diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/ingestion/workflows/profiler/metrics.md b/openmetadata-docs-v1/content/v1.0.0/connectors/ingestion/workflows/profiler/metrics.md index b1a182e6e58..71aafa701fc 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/ingestion/workflows/profiler/metrics.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/ingestion/workflows/profiler/metrics.md @@ -127,8 +127,30 @@ Only for numerical values. Returns the sum of all values in a column. Only for numerical values. Returns the standard deviation. ### Histogram +The histogram returns a dictionary of the different bins and the number of values found for that bin. It will be computed only if the Inter Quartile Range value is available -The histogram returns a dictionary of the different bins and the number of values found for that bin. +### First Quartile +Only for numerical values. Middle number between the smallest value and the median + +### Third Quartile +Only for numerical values. Middle number between the median and the greatest value + +### Inter Quartile Range +Only for numerical values. Difference between the third quartile and the first quartile + +### Nonparametric Skew +Measure of skewness of the column distribution. Nonparametric skew is computed as follow +$$ + S = \frac{\mu-\tilde{\mu}}{\sigma} +$$ + +Where + +$$ +\mu = mean\\ +\tilde{\mu} = median\\ +\sigma = standard deviation\\ +$$ ## Grant Access to User for System Metrics OpenMetadata uses system tables to compute system metrics. You can find the required access as well as more details for your database engine below. @@ -138,12 +160,12 @@ OpenMetadata uses the `QUERY_HISTORY_BY_WAREHOUSE` view of the `INFORMATION_SCHE OpenMetadata will look at the past 24-hours to fetch the operations that were performed against a table. ### Redshift -OpenMetadata uses `stl_insert`, `stl_delete`, `svv_table_info`, and `stl_querytext` to fecth DNL operations as well as the number of rows affected by these operations. You need to make sure the user running the profiler workflow has access to these views and tables. +OpenMetadata uses `stl_insert`, `stl_delete`, `svv_table_info`, and `stl_querytext` to fecth DML operations as well as the number of rows affected by these operations. You need to make sure the user running the profiler workflow has access to these views and tables. OpenMetadata will look at the previous day to fetch the operations that were performed against a table. ### BigQuery -Bigquery uses the `JOBS` table of the `INFORMATION_SCHEMA` to fecth DNL operations as well as the number of rows affected by these operations. You will need to make sure your data location is properly set when creating your BigQuery service connection in OpenMetadata. +Bigquery uses the `JOBS` table of the `INFORMATION_SCHEMA` to fecth DML operations as well as the number of rows affected by these operations. You will need to make sure your data location is properly set when creating your BigQuery service connection in OpenMetadata. OpenMetadata will look at the previous day to fetch the operations that were performed against a table filter on the `creation_time` partition field to limit the size of data scanned. diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/ml-model/mlflow/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/ml-model/mlflow/index.md index ce7babcadca..6cccd79cce9 100644 --- a/openmetadata-docs-v1/content/v1.0.0/connectors/ml-model/mlflow/index.md +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/ml-model/mlflow/index.md @@ -12,8 +12,12 @@ Configure and schedule Mlflow metadata and profiler workflows from the OpenMetad - [Requirements](#requirements) - [Metadata Ingestion](#metadata-ingestion) -If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check -the following docs to connect using Airflow SDK or with the CLI. +If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check the following docs to connect using Airflow SDK or with the CLI. + +### Metadata +To extract metadata, OpenMetadata needs two elements: +- **Tracking URI**: Address of local or remote tracking server. More information on the MLFlow documentation [here](https://www.mlflow.org/docs/latest/tracking.html#where-runs-are-recorded) +- **Registry URI**: Address of local or remote model registry server. {% tilesContainer %} diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/airflow.md b/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/airflow.md new file mode 100644 index 00000000000..c2c9e038f3e --- /dev/null +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/airflow.md @@ -0,0 +1,313 @@ +--- +title: Run Nifi Connector using Airflow SDK +slug: /connectors/pipeline/nifi/airflow +--- + +# Run Nifi using the metadata CLI + +In this section, we provide guides and references to use the Nifi connector. + +Configure and schedule Nifi metadata and profiler workflows from the OpenMetadata UI: + +- [Requirements](#requirements) +- [Metadata Ingestion](#metadata-ingestion) + +## Requirements + +{%inlineCallout icon="description" bold="OpenMetadata 0.12 or later" href="/deployment"%} +To deploy OpenMetadata, check the Deployment guides. +{% /inlineCallout %} + +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. + +### Python Requirements + +To run the Nifi ingestion, you will need to install: + +```bash +pip3 install "openmetadata-ingestion[nifi]" +``` + +## Metadata Ingestion + +All connectors are defined as JSON Schemas. +[Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/pipeline/nifiConnection.json) +you can find the structure to create a connection to Nifi. + +In order to create and run a Metadata Ingestion workflow, we will follow +the steps to create a YAML configuration able to connect to the source, +process the Entities if needed, and reach the OpenMetadata server. + +The workflow is modeled around the following +[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json) + +### 1. Define the YAML Config + +This is a sample config for Nifi: + +{% codePreview %} + +{% codeInfoContainer %} + +#### Source Configuration - Service Connection + +{% codeInfo srNumber=1 %} + +**hostPort**: Pipeline Service Management UI URL +**nifiConfig**: one of + **1.** Using Basic authentication + - **username**: Username to connect to Nifi. This user should be able to send request to the Nifi API and access the `Resources` endpoint. + - **password**: Password to connect to Nifi. + - **verifySSL**: Whether SSL verification should be perform when authenticating. + **2.** Using client certificate authentication + - **certificateAuthorityPath**: Path to the certificate authority (CA) file. This is the certificate used to store and issue your digital certificate. This is an optional parameter. If omitted SSL verification will be skipped; this can present some sever security issue. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + - **clientCertificatePath**: Path to the certificate client file. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + - **clientkeyPath**: Path to the client key file. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + +{% /codeInfo %} + + +#### Source Configuration - Source Config + +{% codeInfo srNumber=2 %} + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): + +**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. + +**includeTags**: Set the Include tags toggle to control whether or not to include tags as part of metadata ingestion. + +**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. + +**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. + +{% /codeInfo %} + + +#### Sink Configuration + +{% codeInfo srNumber=3 %} + +To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. + +{% /codeInfo %} + +#### Workflow Configuration + +{% codeInfo srNumber=4 %} + +The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. + +For a simple, local installation using our docker containers, this looks like: + +{% /codeInfo %} + +{% /codeInfoContainer %} + +{% codeBlock fileName="filename.yaml" %} + + +```yaml +source: + type: nifi + serviceName: nifi_source + serviceConnection: + config: + type: Nifi + hostPort: my_host:8443 + nifiConfig: + username: my_username + password: my_password + verifySSL: + ## client certificate authentication + # certificateAuthorityPath: path/to/CA + # clientCertificatePath: path/to/clientCertificate + # clientkeyPath: path/to/clientKey + +``` +```yaml {% srNumber=1 %} + hostPort: http://localhost:8000 +``` +```yaml {% srNumber=2 %} + sourceConfig: + config: + type: PipelineMetadata + # markDeletedPipelines: True + # includeTags: True + # includeLineage: true + # pipelineFilterPattern: + # includes: + # - pipeline1 + # - pipeline2 + # excludes: + # - pipeline3 + # - pipeline4 +``` +```yaml {% srNumber=3 %} +sink: + type: metadata-rest + config: {} +``` + +```yaml {% srNumber=4 %} +workflowConfig: + openMetadataServerConfig: + hostPort: "http://localhost:8585/api" + authProvider: openmetadata + securityConfig: + jwtToken: "{bot_jwt_token}" +``` + +{% /codeBlock %} + +{% /codePreview %} + +### Workflow Configs for Security Provider + +We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-spec/src/main/resources/json/schema/security/client). + +## Openmetadata JWT Auth + +- JWT tokens will allow your clients to authenticate against the OpenMetadata server. To enable JWT Tokens, you will get more details [here](/deployment/security/enable-jwt-tokens). + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: "http://localhost:8585/api" + authProvider: openmetadata + securityConfig: + jwtToken: "{bot_jwt_token}" +``` + +- You can refer to the JWT Troubleshooting section [link](/deployment/security/jwt-troubleshooting) for any issues in your JWT configuration. If you need information on configuring the ingestion with other security providers in your bots, you can follow this doc [link](/deployment/security/workflow-config-auth). + + +### 2. Prepare the Ingestion DAG + +Create a Python file in your Airflow DAGs directory with the following contents: + +{% codePreview %} + +{% codeInfoContainer %} + + +{% codeInfo srNumber=5 %} + +#### Import necessary modules + +The `Workflow` class that is being imported is a part of a metadata ingestion framework, which defines a process of getting data from different sources and ingesting it into a central metadata repository. + +Here we are also importing all the basic requirements to parse YAMLs, handle dates and build our DAG. + +{% /codeInfo %} + +{% codeInfo srNumber=6 %} + +**Default arguments for all tasks in the Airflow DAG.** + +- Default arguments dictionary contains default arguments for tasks in the DAG, including the owner's name, email address, number of retries, retry delay, and execution timeout. + +{% /codeInfo %} + +{% codeInfo srNumber=7 %} + +- **config**: Specifies config for the metadata ingestion as we prepare above. + +{% /codeInfo %} + +{% codeInfo srNumber=8 %} + +- **metadata_ingestion_workflow()**: This code defines a function `metadata_ingestion_workflow()` that loads a YAML configuration, creates a `Workflow` object, executes the workflow, checks its status, prints the status to the console, and stops the workflow. + +{% /codeInfo %} + +{% codeInfo srNumber=9 %} + +- **DAG**: creates a DAG using the Airflow framework, and tune the DAG configurations to whatever fits with your requirements +- For more Airflow DAGs creation details visit [here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#declaring-a-dag). + +{% /codeInfo %} + +Note that from connector to connector, this recipe will always be the same. +By updating the `YAML configuration`, you will be able to extract metadata from different sources. + +{% /codeInfoContainer %} + +{% codeBlock fileName="filename.py" %} + +```python {% srNumber=5 %} +import pathlib +import yaml +from datetime import timedelta +from airflow import DAG +from metadata.config.common import load_config_file +from metadata.ingestion.api.workflow import Workflow +from airflow.utils.dates import days_ago + +try: + from airflow.operators.python import PythonOperator +except ModuleNotFoundError: + from airflow.operators.python_operator import PythonOperator + + +``` + +```python {% srNumber=6 %} +default_args = { + "owner": "user_name", + "email": ["username@org.com"], + "email_on_failure": False, + "retries": 3, + "retry_delay": timedelta(minutes=5), + "execution_timeout": timedelta(minutes=60) +} + + +``` + +```python {% srNumber=7 %} +config = """ + +""" + + +``` + +```python {% srNumber=8 %} +def metadata_ingestion_workflow(): + workflow_config = yaml.safe_load(config) + workflow = Workflow.create(workflow_config) + workflow.execute() + workflow.raise_from_status() + workflow.print_status() + workflow.stop() + + +``` + +```python {% srNumber=9 %} +with DAG( + "sample_data", + default_args=default_args, + description="An example DAG which runs a OpenMetadata ingestion workflow", + start_date=days_ago(1), + is_paused_upon_creation=False, + schedule_interval='*/5 * * * *', + catchup=False, +) as dag: + ingest_task = PythonOperator( + task_id="ingest_using_recipe", + python_callable=metadata_ingestion_workflow, + ) + + +``` + +{% /codeBlock %} + +{% /codePreview %} + + diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/cli.md b/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/cli.md new file mode 100644 index 00000000000..70323fa5f54 --- /dev/null +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/cli.md @@ -0,0 +1,197 @@ +--- +title: Run Nifi Connector using the CLI +slug: /connectors/pipeline/nifi/cli +--- + +# Run Nifi using the metadata CLI + +In this section, we provide guides and references to use the Nifi connector. + +Configure and schedule Nifi metadata and profiler workflows from the OpenMetadata UI: + +- [Requirements](#requirements) +- [Metadata Ingestion](#metadata-ingestion) + +## Requirements + +{%inlineCallout icon="description" bold="OpenMetadata 0.12 or later" href="/deployment"%} +To deploy OpenMetadata, check the Deployment guides. +{% /inlineCallout %} + +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with +custom Airflow plugins to handle the workflow deployment. + +### Python Requirements + +To run the Nifi ingestion, you will need to install: + +```bash +pip3 install "openmetadata-ingestion[nifi]" +``` + +## Metadata Ingestion + +All connectors are defined as JSON Schemas. +[Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/pipeline/nifiConnection.json) +you can find the structure to create a connection to Nifi. + +In order to create and run a Metadata Ingestion workflow, we will follow +the steps to create a YAML configuration able to connect to the source, +process the Entities if needed, and reach the OpenMetadata server. + +The workflow is modeled around the following +[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json) + +### 1. Define the YAML Config + +This is a sample config for Nifi: + +{% codePreview %} + +{% codeInfoContainer %} + +#### Source Configuration - Service Connection + +{% codeInfo srNumber=1 %} + +**hostPort**: Pipeline Service Management UI URL +**nifiConfig**: one of + **1.** Using Basic authentication + - **username**: Username to connect to Nifi. This user should be able to send request to the Nifi API and access the `Resources` endpoint. + - **password**: Password to connect to Nifi. + - **verifySSL**: Whether SSL verification should be perform when authenticating. + **2.** Using client certificate authentication + - **certificateAuthorityPath**: Path to the certificate authority (CA) file. This is the certificate used to store and issue your digital certificate. This is an optional parameter. If omitted SSL verification will be skipped; this can present some sever security issue. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + - **clientCertificatePath**: Path to the certificate client file. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + - **clientkeyPath**: Path to the client key file. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + + +{% /codeInfo %} + + +#### Source Configuration - Source Config + +{% codeInfo srNumber=2 %} + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): + +**dbServiceNames**: Database Service Name for the creation of lineage, if the source supports it. + +**includeTags**: Set the Include tags toggle to control whether or not to include tags as part of metadata ingestion. + +**markDeletedPipelines**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. + +**pipelineFilterPattern** and **chartFilterPattern**: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. + +{% /codeInfo %} + + +#### Sink Configuration + +{% codeInfo srNumber=3 %} + +To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. + +{% /codeInfo %} + +#### Workflow Configuration + +{% codeInfo srNumber=4 %} + +The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. + +For a simple, local installation using our docker containers, this looks like: + +{% /codeInfo %} + +{% /codeInfoContainer %} + +{% codeBlock fileName="filename.yaml" %} + + +```yaml +source: + type: nifi + serviceName: nifi_source + serviceConnection: + config: + type: Nifi + hostPort: my_host:8433 + nifiConfig: + username: my_username + password: my_password + verifySSL: + ## client certificate authentication + # certificateAuthorityPath: path/to/CA + # clientCertificatePath: path/to/clientCertificate + # clientkeyPath: path/to/clientKey +``` +```yaml {% srNumber=1 %} + hostPort: http://localhost:8000 +``` +```yaml {% srNumber=2 %} + sourceConfig: + config: + type: PipelineMetadata + # markDeletedPipelines: True + # includeTags: True + # includeLineage: true + # pipelineFilterPattern: + # includes: + # - pipeline1 + # - pipeline2 + # excludes: + # - pipeline3 + # - pipeline4 +``` +```yaml {% srNumber=3 %} +sink: + type: metadata-rest + config: {} +``` + +```yaml {% srNumber=4 %} +workflowConfig: + openMetadataServerConfig: + hostPort: "http://localhost:8585/api" + authProvider: openmetadata + securityConfig: + jwtToken: "{bot_jwt_token}" +``` + +{% /codeBlock %} + +{% /codePreview %} + +### Workflow Configs for Security Provider + +We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-spec/src/main/resources/json/schema/security/client). + +## Openmetadata JWT Auth + +- JWT tokens will allow your clients to authenticate against the OpenMetadata server. To enable JWT Tokens, you will get more details [here](/deployment/security/enable-jwt-tokens). + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: "http://localhost:8585/api" + authProvider: openmetadata + securityConfig: + jwtToken: "{bot_jwt_token}" +``` + +- You can refer to the JWT Troubleshooting section [link](/deployment/security/jwt-troubleshooting) for any issues in your JWT configuration. If you need information on configuring the ingestion with other security providers in your bots, you can follow this doc [link](/deployment/security/workflow-config-auth). + +### 2. Run with the CLI + +First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: + +```bash +metadata ingest -c +``` + +Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, +you will be able to extract metadata from different sources. diff --git a/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/index.md b/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/index.md new file mode 100644 index 00000000000..0bacffa6aa9 --- /dev/null +++ b/openmetadata-docs-v1/content/v1.0.0/connectors/pipeline/nifi/index.md @@ -0,0 +1,306 @@ +--- +title: Nifi +slug: /connectors/pipeline/nifi +--- + +# Nifi + +In this section, we provide guides and references to use the Nifi connector. + +Configure and schedule Nifi metadata workflows from the OpenMetadata UI: + +- [Requirements](#requirements) +- [Metadata Ingestion](#metadata-ingestion) + +If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check the following docs to connect using Airflow SDK or with the CLI. + +{% tilesContainer %} +{% tile + title="Ingest with Airflow" + description="Configure the ingestion using Airflow SDK" + link="/connectors/dashboard/nifi/airflow" + / %} +{% tile + title="Ingest with the CLI" + description="Run a one-time ingestion using the metadata CLI" + link="/connectors/dashboard/nifi/cli" + / %} + +{% /tilesContainer %} + +## Requirements + +{%inlineCallout icon="description" bold="OpenMetadata 0.12 or later" href="/deployment"%} +To deploy OpenMetadata, check the Deployment guides. +{% /inlineCallout %} + +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. + +### Metadata +OpenMetadata supports 2 types of connection for the Nifi connector: +- **basic authentication**: use username/password to authenticate to Nifi. +- **client certificate authentication**: use CA, client certificate and client key files to authenticate. + +The user should be able to send request to the Nifi API and access the `Resources` endpoint. + +## Metadata Ingestion + +{% stepsContainer %} + +{% step srNumber=1 %} + +{% stepDescription title="1. Visit the Services Page" %} + +The first step is ingesting the metadata from your sources. Under +Settings, you will find a Services link an external source system to +OpenMetadata. Once a service is created, it can be used to configure +metadata, usage, and profiler workflows. + +To visit the Services page, select Services from the Settings menu. + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image +src="/images/v1.0.0/openmetadata/connectors/visit-services.png" +alt="Visit Services Page" +caption="Find Pipeline option on left panel of the settings page" /%} + +{% /stepVisualInfo %} + +{% /step %} + +{% step srNumber=2 %} + +{% stepDescription title="2. Create a New Service" %} + +Click on the 'Add New Service' button to start the Service creation. + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image +src="/images/v1.0.0/openmetadata/connectors/create-service.png" +alt="Create a new service" +caption="Add a new Service from the Dashboard Services page" /%} + +{% /stepVisualInfo %} + +{% /step %} + + + +{% step srNumber=3 %} + +{% stepDescription title="3. Select the Service Type" %} + +Select Nifi as the service type and click Next. + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image + src="/images/v1.0.0/openmetadata/connectors/nifi/select-service.png" + alt="Select Service" + caption="Select your service from the list" /%} + +{% /stepVisualInfo %} + +{% /step %} + +{% step srNumber=4 %} + +{% stepDescription title="4. Name and Describe your Service" %} + +Provide a name and description for your service as illustrated below. + +#### Service Name + +OpenMetadata uniquely identifies services by their Service Name. Provide +a name that distinguishes your deployment from other services, including +the other {connector} services that you might be ingesting metadata +from. + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image + src="/images/v1.0.0/openmetadata/connectors/nifi/add-new-service.png" + alt="Add New Service" + caption="Provide a Name and description for your Service" /%} + +{% /stepVisualInfo %} + +{% /step %} + +{% step srNumber=5 %} + +{% stepDescription title="5. Configure the Service Connection" %} + +In this step, we will configure the connection settings required for +this connector. Please follow the instructions below to ensure that +you've configured the connector to read from your nifi service as +desired. + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image + src="/images/v1.0.0/openmetadata/connectors/nifi/service-connection.png" + alt="Configure service connection" + caption="Configure the service connection by filling the form" /%} + +{% /stepVisualInfo %} + +{% /step %} + +{% extraContent parentTagName="stepsContainer" %} + +#### Connection Options + +- **Host and Port**: Pipeline Service Management/UI URI. This should be specified as a string in the format 'hostname:port'. + +- **Nifi Config**: OpenMetadata supports username/password or client certificate authentication. + 1. Basic Authentication + - Username: Username to connect to Nifi. This user should be able to send request to the Nifi API and access the `Resources` endpoint. + - Password: Password to connect to Nifi. + - Verify SSL: Whether SSL verification should be perform when authenticating. + 2. Client Certificate Authentication + - Certificate Authority Path: Path to the certificate authority (CA) file. This is the certificate used to store and issue your digital certificate. This is an optional parameter. If omitted SSL verification will be skipped; this can present some sever security issue. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + - Client Certificate Path: Path to the certificate client file. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + - Client Key Path: Path to the client key file. + **important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. + +{% /extraContent %} + +{% step srNumber=6 %} + +{% stepDescription title="6. Test the Connection" %} + +Once the credentials have been added, click on `Test Connection` and Save +the changes. + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image + src="/images/v1.0.0/openmetadata/connectors/test-connection.png" + alt="Test Connection" + caption="Test the connection and save the Service" /%} + +{% /stepVisualInfo %} + +{% /step %} + +{% step srNumber=7 %} + +{% stepDescription title="7. Configure Metadata Ingestion" %} + +In this step we will configure the metadata ingestion pipeline, +Please follow the instructions below + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image +src="/images/v1.0.0/openmetadata/connectors/configure-metadata-ingestion-dashboard.png" +alt="Configure Metadata Ingestion" +caption="Configure Metadata Ingestion Page" /%} + +{% /stepVisualInfo %} + +{% /step %} + +{% extraContent parentTagName="stepsContainer" %} + +#### Metadata Ingestion Options + +- **Name**: This field refers to the name of ingestion pipeline, you can customize the name or use the generated name. +- **Pipeline Filter Pattern (Optional)**: Use to pipeline filter patterns to control whether or not to include pipeline as part of metadata ingestion. + - **Include**: Explicitly include pipeline by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be excluded. + - **Exclude**: Explicitly exclude pipeline by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be included. +- **Include lineage (toggle)**: Set the Include lineage toggle to control whether or not to include lineage between pipelines and data sources as part of metadata ingestion. +- **Enable Debug Log (toggle)**: Set the Enable Debug Log toggle to set the default log level to debug, these logs can be viewed later in Airflow. +- **Mark Deleted Pipelines (toggle)**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system. + +{% /extraContent %} + +{% step srNumber=8 %} + +{% stepDescription title="8. Schedule the Ingestion and Deploy" %} + +Scheduling can be set up at an hourly, daily, or weekly cadence. The +timezone is in UTC. Select a Start Date to schedule for ingestion. It is +optional to add an End Date. + +Review your configuration settings. If they match what you intended, +click Deploy to create the service and schedule metadata ingestion. + +If something doesn't look right, click the Back button to return to the +appropriate step and change the settings as needed. + +After configuring the workflow, you can click on Deploy to create the +pipeline. + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image +src="/images/v1.0.0/openmetadata/connectors/schedule.png" +alt="Schedule the Workflow" +caption="Schedule the Ingestion Pipeline and Deploy" /%} + +{% /stepVisualInfo %} + +{% /step %} + + +{% step srNumber=9 %} + +{% stepDescription title="9. View the Ingestion Pipeline" %} + +Once the workflow has been successfully deployed, you can view the +Ingestion Pipeline running from the Service Page. + +{% /stepDescription %} + +{% stepVisualInfo %} + +{% image +src="/images/v1.0.0/openmetadata/connectors/view-ingestion-pipeline.png" +alt="View Ingestion Pipeline" +caption="View the Ingestion Pipeline from the Service Page" /%} + +{% /stepVisualInfo %} + +{% /step %} + +{% /stepsContainer %} + +## Troubleshooting + + ### Workflow Deployment Error + +If there were any errors during the workflow deployment process, the +Ingestion Pipeline Entity will still be created, but no workflow will be +present in the Ingestion container. + +- You can then edit the Ingestion Pipeline and Deploy it again. + +- From the Connection tab, you can also Edit the Service if needed. + +{% image +src="/images/v1.0.0/openmetadata/connectors/workflow-deployment-error.png" +alt="Workflow Deployment Error" +caption="Edit and Deploy the Ingestion Pipeline" /%} + diff --git a/openmetadata-docs-v1/content/v1.0.0/menu.md b/openmetadata-docs-v1/content/v1.0.0/menu.md index bb7cd0742c6..51280d52a96 100644 --- a/openmetadata-docs-v1/content/v1.0.0/menu.md +++ b/openmetadata-docs-v1/content/v1.0.0/menu.md @@ -457,6 +457,12 @@ site_menu: url: /connectors/pipeline/airbyte/airflow - category: Connectors / Pipeline / Airbyte / CLI url: /connectors/pipeline/airbyte/cli + - category: Connectors / Pipeline / Nifi + url: /connectors/pipeline/nifi + - category: Connectors / Pipeline / Nifi / Airflow + url: /connectors/pipeline/nifi/airflow + - category: Connectors / Pipeline / Nifi / CLI + url: /connectors/pipeline/nifi/cli - category: Connectors / Pipeline / Glue Pipeline url: /connectors/pipeline/glue-pipeline - category: Connectors / Pipeline / Glue Pipeline / Airflow diff --git a/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/add-new-service.png b/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/add-new-service.png new file mode 100644 index 00000000000..41f4548dffe Binary files /dev/null and b/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/add-new-service.png differ diff --git a/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/select-service.png b/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/select-service.png new file mode 100644 index 00000000000..66e580fb9bd Binary files /dev/null and b/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/select-service.png differ diff --git a/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/service-connection.png b/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/service-connection.png new file mode 100644 index 00000000000..7415706c995 Binary files /dev/null and b/openmetadata-docs-v1/images/v1.0.0/openmetadata/connectors/nifi/service-connection.png differ diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Hive.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Hive.md index e5a4b6aaaf9..adb9d73deb1 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Hive.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Hive.md @@ -1,78 +1,67 @@ # Hive +In this section, we provide guides and references to use the Hive connector. You can view the full documentation for Hive [here](https://docs.open-metadata.org/connectors/database/hive). -In this section, we provide guides and references to use the Hive connector. +## Requirements +To extract metadata, the user used in the connection needs to be able to perform `SELECT`, `SHOW`, and `DESCRIBE` operations in the database/schema where the metadata needs to be extracted from. -# Requirements - -To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with -custom Airflow plugins to handle the workflow deployment. +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). ## Connection Details $$section ### Scheme $(id="scheme") - -SQLAlchemy driver scheme options. -$$ - -$$section +SQLAlchemy driver scheme options. If you are unsure about this setting, you can use the default value. OpenMetadata supports both `Hive` and `Impala`. ### Username $(id="username") - -Username to connect to Hive. This user should have privileges to read all the metadata in Hive. +Username to connect to Hive. This user should have the necessary privileges described in the section above. $$ $$section ### Password $(id="password") - Password to connect to Hive. $$ $$section ### Host Port $(id="hostPort") - -The hostPort parameter specifies the host and port of the Hive server. This should be specified as a string in the format 'hostname:port'. For example, you might set the hostPort parameter to `myhivehost:10000`. +The hostPort parameter specifies the host and port of the Hive server. This should be specified as a string in the format `hostname:port`. +**Example**: `myhivehost:10000`. $$ $$section ### Auth $(id="auth") - - The auth parameter specifies the authentication method to use when connecting to the Hive server. Possible values are 'LDAP', 'NONE', 'CUSTOM', or 'KERBEROS'. If you are using Kerberos authentication, you should set auth to 'KERBEROS'. If you are using custom authentication, you should set auth to 'CUSTOM' and provide additional options in the authOptions parameter. +The auth parameter specifies the authentication method to use when connecting to the Hive server. Possible values are `LDAP`, `NONE`, `CUSTOM`, or `KERBEROS`. If you are using Kerberos authentication, you should set auth to `KERBEROS`. If you are using custom authentication, you should set auth to `CUSTOM` and provide additional options in the `authOptions` parameter. $$ $$section ### Kerberos Service Name $(id="kerberosServiceName") - -The kerberosServiceName parameter specifies the Kerberos service name to use for authentication. This should only be specified if using Kerberos authentication. The default value is 'hive'. +The kerberosServiceName parameter specifies the Kerberos service name to use for authentication. This should only be specified if using Kerberos authentication. The default value is `hive`. $$ $$section ### Database Schema $(id="databaseSchema") - databaseSchema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single databaseSchema. When left blank, OpenMetadata Ingestion attempts to scan all the databaseSchema. $$ $$section ### Database Name $(id="databaseName") - -Optional name to give to the database in OpenMetadata. If left blank, we will use default as the database name. +In OpenMetadata, the Database Service hierarchy works as follow: +``` +Database Service > Database > Schema > Table +``` +In the case of Hive, we won't have a Database as such. If you'd like to see your data in a database named something other than `default`, you can specify the name in this field. $$ $$section ### Auth Options $(id="authOptions") - Authentication options to pass to Hive connector. These options are based on SQLAlchemy. $$ $$section ### Connection Options $(id="connectionOptions") - Additional connection options to build the URL that can be sent to service during the connection. The connectionOptions parameter is specific to the connection method being used. For example, if you are using SSL encryption, you might set the connectionOptions parameter to {'ssl': 'true', 'sslTrustStore': '/path/to/truststore'}. - $$ $$section ### Connection Arguments $(id="connectionArguments") - Additional connection arguments such as security or protocol configs that can be sent to service during connection. - -$$ +$$ \ No newline at end of file diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/MariaDB.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/MariaDB.md index 0ba877ed3cf..7053522ec6f 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/MariaDB.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/MariaDB.md @@ -1,65 +1,68 @@ # MariaDB +In this section, we provide guides and references to use the MariaDB connector. You can view the full documentation for MariaDB [here](https://docs.open-metadata.org/connectors/database/mariadb). -In this section, we provide guides and references to use the MariaDB connector. +## Requirements +To extract metadata the user used in the connection needs to have access to the `INFORMATION_SCHEMA`. By default a user can see only the rows in the `INFORMATION_SCHEMA` that correspond to objects for which the user has the proper access privileges. -# Requirements - -You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/database/mariadb). +```SQL +-- Create user. More details https://mariadb.com/kb/en/create-user/ +CREATE USER [@] IDENTIFIED BY ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a specific object +GRANT SELECT ON world.hello TO ''; +``` + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). ## Connection Details - $$section ### Scheme $(id="scheme") - -SQLAlchemy driver scheme options. - +SQLAlchemy driver scheme options. If you are unsure about this setting, you can use the default value. $$ $$section ### Username $(id="username") - -Username to connect to MariaDB. This user should have privileges to read all the metadata in MariaDB. - +Username to connect to MariaDB. This user should have access to the `INFORMATION_SCHEMA` to extract metadata. Other workflows may require different permissions -- refer to the section above for more information. $$ $$section ### Password $(id="password") - Password to connect to MariaDB. - $$ $$section ### Host Port $(id="hostPort") - -Host and port of the MariaDB service. - +Host and port of the MariaDB service. This should be specified as a string in the format 'hostname:port'. +**Example**: `localhost:3306`, `host.docker.internal:3306` $$ $$section ### Database Name $(id="databaseName") - -Optional name to give to the database in OpenMetadata. If left blank, we will use default as the database name. - +In OpenMetadata, the Database Service hierarchy works as follow: +``` +Database Service > Database > Schema > Table +``` +In the case of MariaDB, we won't have a Database as such. If you'd like to see your data in a database named something other than `default`, you can specify the name in this field. $$ $$section ### Database Schema $(id="databaseSchema") - -databaseSchema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single databaseSchema. When left blank, OpenMetadata Ingestion attempts to scan all the databaseSchema. - +This is an optional parameter. When set, the value will be used to restrict the metadata reading to a single database (corresponding to the value passed in this field). When left blank, OpenMetadata will scan all the databases. $$ $$section ### Connection Options $(id="connectionOptions") - Additional connection options to build the URL that can be sent to service during the connection. - $$ $$section ### Connection Arguments $(id="connectionArguments") - Additional connection arguments such as security or protocol configs that can be sent to service during connection. - $$ diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Mysql.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Mysql.md index 79c2bf3c595..5417086dffb 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Mysql.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Mysql.md @@ -1,96 +1,88 @@ # Mysql +In this section, we provide guides and references to use the Mysql connector. You can view the full documentation for MySQL [here](https://docs.open-metadata.org/connectors/database/mysql). -In this section, we provide guides and references to use the Mysql connector. +## Requirements +To extract metadata the user used in the connection needs to have access to the `INFORMATION_SCHEMA`. By default a user can see only the rows in the `INFORMATION_SCHEMA` that correspond to objects for which the user has the proper access privileges. -# Requirements -To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. +```SQL +-- Create user. If is ommited, defaults to '%' +-- More details https://dev.mysql.com/doc/refman/8.0/en/create-user.html +CREATE USER ''[@''] IDENTIFIED BY ''; -Note that We support MySQL (version 8.0.0 or greater) and the user should have access to the `INFORMATION_SCHEMA` table. +-- Grant select on a database +GRANT SELECT ON world.* TO ''; -You can find further information on the Athena connector in the [docs](https://docs.open-metadata.org/connectors/database/mysql). +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a specific object +GRANT SELECT ON world.hello TO ''; +``` + +$$note +OpenMetadata supports MySQL version 8.0.0 and up. +$$ + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler). ## Connection Details - $$section ### Scheme $(id="scheme") - -SQLAlchemy driver scheme options. +SQLAlchemy driver scheme options. If you are unsure about this setting, you can use the default value. $$ $$section ### Username $(id="username") - -Username to connect to Mysql. This user should have privileges to read all the metadata in Mysql. +Username to connect to Mysql. This user should have access to the `INFORMATION_SCHEMA` to extract metadata. Other workflows may require different permissions -- refer to the section above for more information. $$ $$section ### Password $(id="password") - Password to connect to Mysql. $$ $$section ### Host Port $(id="hostPort") - -Host and port of the Mysql service. - -**Example**: `localhost:3306` or `host.docker.internal:3306` +Host and port of the Mysql service. This should be specified as a string in the format 'hostname:port'. +**Example**: `localhost:3306`, `host.docker.internal:3306` $$ $$section ### Database Name $(id="databaseName") - -In OpenMetadata, the Database Service hierarchy works as follows: - +In OpenMetadata, the Database Service hierarchy works as follow: ``` Database Service > Database > Schema > Table ``` - -In the case of Mysql, we won't have a Database as such. If you'd like to see your data in a database -named something other than `default`, you can specify the name in this field. +In the case of Mysql, we won't have a Database as such. If you'd like to see your data in a database named something other than `default`, you can specify the name in this field. $$ $$section ### Database Schema $(id="databaseSchema") - -In OpenMetadata, the Database Service hierarchy works as follows: - -``` -Database Service > Database > Schema > Table -``` - -In the case of MySQL, we won't have a DatabaseSchema as such. If you'd like to see your data in a databaseSchema named something other than `default`, you can specify the name in this field. +This is an optional parameter. When set, the value will be used to restrict the metadata reading to a single database (corresponding to the value passed in this field). When left blank, OpenMetadata will scan all the databases. $$ $$section -### Ssl CA $(id="sslCA") - -Provide the path to ssl ca file -Provide the path to ssl client certificate file (ssl_cert) +### SSL CA $(id="sslCA") +Provide the path to SSL ca file $$ $$section -### Ssl Cert $(id="sslCert") - -Provide the path to ssl client certificate file (ssl_cert) +### SSL Cert $(id="sslCert") +Provide the path to SSL client certificate file (ssl_cert) $$ $$section -### Ssl Key $(id="sslKey") - -Provide the path to ssl client certificate file (ssl_key) +### SSL Key $(id="sslKey") +Provide the path to SSL key file (ssl_key) $$ $$section ### Connection Options $(id="connectionOptions") - -Additional connection options to build the URL that can be sent to service during the connection. - +Additional connection options to build the URL that can be sent to the service during the connection. $$ $$section ### Connection Arguments $(id="connectionArguments") - -Additional connection arguments such as security or protocol configs that can be sent to service during connection. - -$$ +Additional connection arguments such as security or protocol configs that can be sent to the service during connection. +$$ \ No newline at end of file diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Presto.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Presto.md index dd3569c37cc..37db1ef6e3e 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Presto.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Presto.md @@ -1,72 +1,51 @@ # Presto +In this section, we provide guides and references to use the Presto connector. You can view the full documentation for Presto [here](https://docs.open-metadata.org/connectors/database/presto). -In this section, we provide guides and references to use the Presto connector. +## Requirements +To extract metadata, the user needs to be able to perform `SHOW CATALOGS`, `SHOW TABLES`, and `SHOW COLUMNS FROM` on the catalogs/tables you wish to extract metadata from and have `SELECT` permission on the `INFORMATION_SCHEMA`. Access to resources will be different based on the connector used. You can find more details in the Presto documentation website [here](https://prestodb.io/docs/current/connector.html). You can also get more information regarding system access control in Presto [here](https://prestodb.io/docs/current/security/built-in-system-access-control.html). -# Requirements - -You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/database/presto). + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). ## Connection Details - $$section ### Scheme $(id="scheme") - -SQLAlchemy driver scheme options. - +SQLAlchemy driver scheme options. If you are unsure about this setting, you can use the default value. $$ $$section ### Username $(id="username") - -Username to connect to Presto. This user should have privileges to read all the metadata in Postgres. - +Username to connect to Presto. This user should be able to perform `SHOW CATALOGS`, `SHOW TABLES`, and `SHOW COLUMNS FROM` and have `SELECT` permission on the `INFORMATION_SCHEMA`. $$ $$section ### Password $(id="password") - Password to connect to Presto. - $$ $$section ### Host Port $(id="hostPort") - -Host and port of the Presto service. - +Host and port of the Presto service. This should be specified as a string in the format 'hostname:port'. +**Example**: `localhost:8080`, `host.docker.internal:8080` $$ $$section ### Database Schema $(id="databaseSchema") - -databaseSchema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single databaseSchema. When left blank, OpenMetadata Ingestion attempts to scan all the databaseSchema. - +This is an optional parameter. When set, the value will be used to restrict the metadata reading to a single database (corresponding to the value passed in this field). When left blank, OpenMetadata will scan all the databases. $$ $$section ### Catalog $(id="catalog") - -Presto catalog - +Presto catalog name. $$ $$section ### Connection Options $(id="connectionOptions") - Additional connection options to build the URL that can be sent to service during the connection. - $$ $$section ### Connection Arguments $(id="connectionArguments") - Additional connection arguments such as security or protocol configs that can be sent to service during connection. - -$$ - -$$section -### Supports Database $(id="supportsDatabase") - -The source service supports the database concept in its hierarchy - -$$ +$$ \ No newline at end of file diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Redshift.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Redshift.md index 3516f9d7be8..f6e82782e31 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Redshift.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Redshift.md @@ -1,80 +1,68 @@ # Redshift +In this section, we provide guides and references to use the Redshift connector. You can view the full documentation for Redshift [here](https://docs.open-metadata.org/connectors/database/redshift). -In this section, we provide guides and references to use the Redshift connector. - -# Requirements +## Requirements Redshift user must grant `SELECT` privilege on `SVV_TABLE_INFO` to fetch the metadata of tables and views. ```sql +-- Create a new user +-- More details https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_USER.html CREATE USER test_user with PASSWORD 'password'; + +-- Grant SELECT on table GRANT SELECT ON TABLE svv_table_info to test_user; ``` -If you plan on running the profiler and quality tests you need to make sure your user has `SELECT` privilege on the tables you wish to run those workflows against. For more information visit [here](https://docs.aws.amazon.com/redshift/latest/dg/c_visibility-of-data.html). +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). -You can find further information on the Redshift connector in the [docs](https://docs.open-metadata.org/connectors/database/redshift). +### Usage & Lineage +For the usage and lineage workflow, the user will need `SELECT` privilege on `STL_QUERY` table. You can find more information on the usage workflow [here](https://docs.open-metadata.org/connectors/ingestion/workflows/usage) and the lineage workflow [here](https://docs.open-metadata.org/connectors/ingestion/workflows/lineage). ## Connection Details - $$section ### Scheme $(id="scheme") - -SQLAlchemy driver scheme options +SQLAlchemy driver scheme options. If you are unsure about this setting, you can use the default value. $$ $$section ### Username $(id="username") - -Username to connect to Redshift. This user should have privileges to read all the metadata in Redshift. +Username to connect to Redshift. This user should have access to `SVV_TABLE_INFO` to extract metadata. Other workflows may require different permissions -- refer to the section above for more information. $$ $$section ### Password $(id="password") - Password to connect to Redshift. $$ $$section ### Host Port $(id="hostPort") - Host and port of the Redshift service. $$ $$section ### Database $(id="database") -Initial Redshift database to connect to. If you want to ingest all databases, set `ingestAllDatabases` to true. +Initial Redshift database to connect to. If you want to ingest all databases, set `ingestAllDatabases` to true. This should be specified as a string in the format 'hostname:port'. +**Example**: `localhost:5439`, `host.docker.internal:5439` $$ $$section ### Ingest All Databases $(id="ingestAllDatabases") - If ticked, the workflow will be able to ingest all database in the cluster. If not ticked, the workflow will only ingest tables from the database set above. $$ $$section -### Ssl Mode $(id="sslMode") - +### SSL Mode $(id="sslMode") SSL Mode to connect to redshift database. E.g, prefer, verify-ca etc. $$ $$section ### Connection Options $(id="connectionOptions") - Additional connection options to build the URL that can be sent to service during the connection. - $$ $$section ### Connection Arguments $(id="connectionArguments") - Additional connection arguments such as security or protocol configs that can be sent to service during connection. - -$$ - -$$section -### Supports Database $(id="supportsDatabase") - -The source service supports the database concept in its hierarchy - -$$ +$$ \ No newline at end of file diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/SingleStore.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/SingleStore.md index 226f739a9cd..4088877e03f 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/SingleStore.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/SingleStore.md @@ -1,65 +1,68 @@ # SingleStore +In this section, we provide guides and references to use the SingleStore connector. You can view the full documentation for SingleStore [here](https://docs.open-metadata.org/connectors/database/singlestore). -In this section, we provide guides and references to use the SingleStore connector. +## Requirements +To extract metadata the user used in the connection needs to have access to the `INFORMATION_SCHEMA`. By default a user can see only the rows in the `INFORMATION_SCHEMA` that correspond to objects for which the user has the proper access privileges. -# Requirements - -You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/database/singlestore). +```SQL +-- Create user. +-- More details https://docs.singlestore.com/managed-service/en/reference/sql-reference/security-management-commands/create-user.html +CREATE USER [@] IDENTIFIED BY ''; +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a database +GRANT SELECT ON world.* TO ''; + +-- Grant select on a specific object +GRANT SELECT ON world.hello TO ''; +``` + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). ## Connection Details - $$section ### Scheme $(id="scheme") - -SQLAlchemy driver scheme options. - +SQLAlchemy driver scheme options. If you are unsure about this setting, you can use the default value. $$ $$section ### Username $(id="username") - -Username to connect to SingleStore. This user should have privileges to read all the metadata in MySQL. - +Username to connect to SingleStore. This user should have access to the `INFORMATION_SCHEMA` to extract metadata. Other workflows may require different permissions -- refer to the section above for more information. $$ $$section ### Password $(id="password") - Password to connect to SingleStore. - $$ $$section ### Host Port $(id="hostPort") - -Host and port of the SingleStore service. - +Host and port of the SingleStore service. This should be specified as a string in the format 'hostname:port'. +**Example**: `localhost:3306`, `host.docker.internal:3306` $$ $$section ### Database Name $(id="databaseName") - -Optional name to give to the database in OpenMetadata. If left blank, we will use default as the database name. - +In OpenMetadata, the Database Service hierarchy works as follow: +``` +Database Service > Database > Schema > Table +``` +In the case of SingleStore, we won't have a Database as such. If you'd like to see your data in a database named something other than `default`, you can specify the name in this field. $$ $$section ### Database Schema $(id="databaseSchema") - -databaseSchema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single databaseSchema. When left blank, OpenMetadata Ingestion attempts to scan all the databaseSchema. - +This is an optional parameter. When set, the value will be used to restrict the metadata reading to a single database (corresponding to the value passed in this field). When left blank, OpenMetadata will scan all the databases. $$ $$section ### Connection Options $(id="connectionOptions") - Additional connection options to build the URL that can be sent to service during the connection. - $$ $$section ### Connection Arguments $(id="connectionArguments") - Additional connection arguments such as security or protocol configs that can be sent to service during connection. - $$ diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Trino.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Trino.md index c9839438775..413ec8837a4 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Trino.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/Trino.md @@ -1,86 +1,67 @@ # Trino +In this section, we provide guides and references to use the Trino connector. You can view the full documentation for Trino [here](https://docs.open-metadata.org/connectors/database/trino). -In this section, we provide guides and references to use the Trino connector. +## Requirements +To extract metadata, the user needs to have `SELECT` permission on the following tables: +- `information_schema.schemata` +- `information_schema.columns` +- `information_schema.tables` +- `information_schema.views` +- `system.metadata.table_comments` -# Requirements - -You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/database/trino). +Access to resources will be based on the user access permission to access specific data sources. More information regarding access and security can be found in the Trino documentation [here](https://trino.io/docs/current/security.html). + +### Profiler & Data Quality +Executing the profiler worflow or data quality tests, will require the user to have `SELECT` permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler) and data quality tests [here](https://docs.open-metadata.org/connectors/ingestion/workflows/data-quality). ## Connection Details - $$section ### Scheme $(id="scheme") - -SQLAlchemy driver scheme options. - +SQLAlchemy driver scheme options. If you are unsure about this setting, you can use the default value. $$ $$section ### Username $(id="username") - -Username to connect to Trino. This user should have privileges to read all the metadata in Trino. - +Username to connect to Trino. This user should have `SELECT` permission on the `SYSTEM.METADATA` and `INFORMATION_SCHEMA` - see the section above for more details. $$ $$section ### Password $(id="password") - Password to connect to Trino. - $$ $$section ### Host Port $(id="hostPort") - -Host and port of the Trino service. - +Host and port of the Trino service. This should be specified as a string in the format 'hostname:port'. +**Example**: `localhost:8080`, `host.docker.internal:8080` $$ $$section ### Catalog $(id="catalog") - -Catalog of the data source. - +Catalog of the data source. $$ $$section ### Database Schema $(id="databaseSchema") - -databaseSchema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single databaseSchema. When left blank, OpenMetadata Ingestion attempts to scan all the databaseSchema. - +This is an optional parameter. When set, the value will be used to restrict the metadata reading to a single database (corresponding to the value passed in this field). When left blank, OpenMetadata will scan all the databases. $$ $$section ### Proxies $(id="proxies") - Proxies for the connection to Trino data source - $$ $$section ### Params $(id="params") - URL parameters for connection to the Trino data source - $$ $$section ### Connection Options $(id="connectionOptions") - Additional connection options to build the URL that can be sent to service during the connection. - $$ $$section ### Connection Arguments $(id="connectionArguments") - Additional connection arguments such as security or protocol configs that can be sent to service during connection. - -$$ - -$$section -### Supports Database $(id="supportsDatabase") - -The source service supports the database concept in its hierarchy - $$ \ No newline at end of file diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/workflows/profiler.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/workflows/profiler.md new file mode 100644 index 00000000000..f3fbf5b6c6d --- /dev/null +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Database/workflows/profiler.md @@ -0,0 +1,29 @@ +# Profiler +This workflow allows you to profile your table assets an gain insight into their structure (e.g. of metrics computed: `max`, `min`, `mean`, etc. The full list can be found [here](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler/metrics)). We recommend to check the [best practices](https://docs.open-metadata.org/connectors/ingestion/workflows/profiler#profiler-best-practices) before creating a profiler workflow. + +## Properties +### Database Filter Pattern $(id="databaseFilterPattern") +Regex to only fetch databases that matches the pattern. + +### Schema Filter Pattern $(id="schemaFilterPattern") +Regex to only fetch schema that matches the pattern. + +### Table Filter Pattern $(id="tableFilterPattern") +Regex exclude tables that matches the pattern. + +### Process PII Sensitive $(id="processPiiSensitive") +Optional configuration to automatically tag columns that might contain sensitive information. If `generateSampleData` is enabled, OpenMetadata will leverage machine learning to infer which column may contain PII sensitive data. If disabled, OpenMetadata will infer from the column name. + +### Profile Sample $(id="profileSample") +Percentage of data or number of rows to use when sampling tables. If left as is, the profiler will run against the entire table. + +### Profile Sample Type $(id="profileSampleType") +Profile sample type can be set to either: +* percentage: this will use a percentage to sample sample the table (e.g. table has 100 rows and we set sample percentage tp 50%, the profiler will use 50 random rows to compute the metrics) +* row count: this will use a number of rows to sample the table (e.g. table has 100 rows and we set row count to 10, the profiler will use 10 random rows to compute the metrics) + +### Thread Count $(id="threadCount") +Number of thread that will be used when computing the profiler metrics. A number set to high can have negative effect on performance. We recommend to use the default value unless you have a good understanding of multithreading. + +### Timeout in Seconds $(id="timeoutSeconds") +This will set the duration a profiling job against a table should wait before interrupting its execution and moving on to profiling the next table. It is important to note that the profiler will wait for the hanging query to terminiate before killing the execution. If there is a risk for your profiling job to hang, it is important to also set a query/connection timeout on your database engine. The default value for the profiler timeout is 12-hours. \ No newline at end of file diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Mlmodel/Mlflow.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Mlmodel/Mlflow.md index 2641bbcef3a..72850a740a9 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Mlmodel/Mlflow.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Mlmodel/Mlflow.md @@ -1,23 +1,19 @@ -# Mlflow - -In this section, we provide guides and references to use the Mlflow connector. - -# Requirements - -You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/mlmodel/mlflow). +# MLflow +In this section, we provide guides and references to use the MLflow connector. You can view the full documentation for MLflow [here](https://docs.open-metadata.org/connectors/ml-model/mlflow). +## Requirements +To extract metadata, OpenMetadata needs two elements: +- **Tracking URI**: Address of local or remote tracking server. More information on the MLFlow documentation [here](https://www.mlflow.org/docs/latest/tracking.html#where-runs-are-recorded) +- **Registry URI**: Address of local or remote model registry server. ## Connection Details - $$section ### Tracking Uri $(id="trackingUri") - -Mlflow Experiment tracking URI. E.g., http://localhost:5000 - +Mlflow Experiment tracking URI. +**Example**: http://localhost:5000 $$ $$section ### Registry Uri $(id="registryUri") - -Mlflow Model registry backend. E.g., mysql+pymysql://mlflow:password@localhost:3307/experiments - -$$ +Mlflow Model registry backend. +**Example**: mysql+pymysql://mlflow:password@localhost:3307/experiments +$$ \ No newline at end of file diff --git a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Pipeline/Nifi.md b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Pipeline/Nifi.md index 7a889b6f44d..51e2d1aac9f 100644 --- a/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Pipeline/Nifi.md +++ b/openmetadata-ui/src/main/resources/ui/public/locales/en-US/Pipeline/Nifi.md @@ -1,30 +1,52 @@ # Nifi +In this section, we provide guides and references to use the Nifi connector. You can view the full documentation for Nifi [here](https://docs.open-metadata.org/connectors/pipeline/nifi). -In this section, we provide guides and references to use the Nifi connector. - -# Requirements - -You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/pipeline/nifi). +## Requirements +OpenMetadata supports 2 types of connection for the Nifi connector: +- **basic authentication**: use username/password to authenticate to Nifi. +- **client certificate authentication**: use CA, client certificate and client key files to authenticate ## Connection Details - $$section ### Host Port $(id="hostPort") - -Pipeline Service Management/UI URI. - +Pipeline Service Management/UI URI. This should be specified as a string in the format 'hostname:port'. +**Example**: `localhost:8443`, `host.docker.internal:8443` $$ $$section ### Nifi Config $(id="nifiConfig") - -We support username/password or client certificate authentication - +OpenMetadata supports basic authentication (username/password) or client certificate authentication. See requirement section for more details. $$ $$section -### Nifi Config $(id="nifiConfig") - -We support username/password or client certificate authentication - +### Username $(id="username") +Username to connect to Nifi. This user should be able to send request to the Nifi API and access the `Resources` endpoint. $$ + +$$section +### Password $(id="password") +Password to connect to Nifi. +$$ + +$$section +### Verify SSL $(id="basicAuthentication.verifySSL") +Whether SSL verification should be perform when authenticating. +$$ + +$$section +### Certificate Authority Path $(id="certificateAuthorityPath") +Path to the certificate authority (CA) file. This is the certificate used to store and issue your digital certificate. This is an optional parameter. If omitted SSL verification will be skipped; this can present some sever security issue. +**important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. +$$ + +$$section +### Client Certificate Path $(id="clientCertificatePath") +Path to the certificate client file. +**important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. +$$ + +$$section +### Client Key Path $(id="clientkeyPath") +Path to the client key file. +**important**: This file should be accessible from where the ingestion workflow is running. For example, if you are using OpenMetadata Ingestion Docker container, this file should be in this container. +$$ \ No newline at end of file