Fix Airflow docs (#12009)

This commit is contained in:
Pere Miquel Brull 2023-06-21 08:36:06 +02:00 committed by GitHub
parent 35cca0e178
commit 7f39cc105f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -9,25 +9,27 @@ We support different approaches to extracting metadata from Airflow:
2. **Airflow Lineage Backend**: which can be configured in your Airflow instance. You can read more about the Lineage Backend [here](https://docs.open-metadata.org/connectors/pipeline/airflow/lineage-backend). 2. **Airflow Lineage Backend**: which can be configured in your Airflow instance. You can read more about the Lineage Backend [here](https://docs.open-metadata.org/connectors/pipeline/airflow/lineage-backend).
3. **Airflow Lineage Operator**: To send metadata directly from your Airflow DAGs. You can read more about the Lineage Operator [here](https://docs.open-metadata.org/connectors/pipeline/airflow/lineage-operator). 3. **Airflow Lineage Operator**: To send metadata directly from your Airflow DAGs. You can read more about the Lineage Operator [here](https://docs.open-metadata.org/connectors/pipeline/airflow/lineage-operator).
You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/pipeline/airflow). From the OpenMetadata UI, you have access to the strategy number 1.
You can find further information on the Airflow connector in the [docs](https://docs.open-metadata.org/connectors/pipeline/airflow).
## Connection Details ## Connection Details
$$section $$section
### Host and Port $(id="hostPort") ### Host and Port
Pipeline Service Management URI. This should be specified as a URI string in the format `scheme://hostname:port`. E.g., `http://localhost:8080`, `http://host.docker.internal:8080`. Pipeline Service Management URI. This should be specified as a URI string in the format `scheme://hostname:port`. E.g., `http://localhost:8080`, `http://host.docker.internal:8080`.
$$ $$
$$section $$section
### Number Of Status $(id="numberOfStatus") ### Number Of Status
Number of past task status to read every time the ingestion runs. By default, we will pick up and update the last 10 runs. Number of past task status to read every time the ingestion runs. By default, we will pick up and update the last 10 runs.
$$ $$
$$section $$section
### Metadata Database Connection $(id="connection") ### Metadata Database Connection
Select your underlying database connection. We support the [official](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html) backends from Airflow. Select your underlying database connection. We support the [official](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html) backends from Airflow.
@ -35,14 +37,76 @@ Note that the **Backend Connection** is only used to extract metadata from a DAG
$$ $$
$$section ---
### Connection Options $(id="connectionOptions")
Additional connection options to build the URL that can be sent to the service during the connection.
$$ ## MySQL Connection
$$section If your Airflow is backed by a MySQL database, then you will need to fill in these details:
### Connection Arguments $(id="connectionArguments")
Additional connection arguments such as security or protocol configs that can be sent to the service during connection.
$$ ### Username & Password
Credentials with permissions to connect to the database. Read-only permissions are required.
### Host and Port
Host and port of the MySQL service. This should be specified as a string in the format `hostname:port`. E.g., `localhost:3306`, `host.docker.internal:3306`.
### Database Schema
MySQL schema that contains the Airflow tables.
### SSL CA $(id="sslCA")
Provide the path to SSL CA file, which needs to be local in the ingestion process.
### SSL Certificate $(id="sslCert")
Provide the path to SSL client certificate file (`ssl_cert`)
### SSL Key $(id="sslKey")
Provide the path to SSL key file (`ssl_key`)
---
## Postgres Connection
If your Airflow is backed by a Postgres database, then you will need to fill in these details:
### Username & Password
Credentials with permissions to connect to the database. Read-only permissions are required.
### Host and Port
Host and port of the Postgres service. E.g., `localhost:5432` or `host.docker.internal:5432`.
### Database
Postgres database that contains the Airflow tables.
### SSL Mode $(id="sslMode")
SSL Mode to connect to postgres database. E.g, `prefer`, `verify-ca` etc.
You can ignore the rest of the properties, since we won't ingest any database not policy tags.
---
## MSSQL Connection
If your Airflow is backed by a MSSQL database, then you will need to fill in these details:
### Username & Password
Credentials with permissions to connect to the database. Read-only permissions are required.
### Host and Port
Host and port of the Postgres service. E.g., `localhost:1433` or `host.docker.internal:1433`.
### Database
MSSQL database that contains the Airflow tables.
### URI String $(id="uriString")
Connection URI String to connect with MSSQL. It only works with `pyodbc` scheme. E.g., `DRIVER={ODBC Driver 17 for SQL Server};SERVER=server_name;DATABASE=db_name;UID=user_name;PWD=password`.