OpenMetadata/openmetadata-docs/content/v1.0.0/connectors/database/impala/cli.md

---
title: Run Impala Connector using the CLI
slug: /connectors/database/impala/cli
---

# Run Impala using the metadata CLI
{% multiTablesWrapper %}

| Stage              | BETA                |
|--------------------|---------------------|
| Metadata           | ✅                   |
| Query Usage        | ❌                   |
| Data Profiler      | ✅                   |
| Data Quality       | ✅                   |
| Lineage            | Partially via Views |
| DBT                | ❌                   |
| Supported Versions | Impala >= 2.0       |

{% /multiTablesWrapper %}

| Lineage | Table-level | Column-level |
|:------:|:-----------:|:-------------:|
| Partially via Views | ✅ | ✅ |

In this section, we provide guides and references to use the Impala connector.

Configure and schedule Impala metadata and profiler workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Data Profiler](#data-profiler)
- [dbt Integration](#dbt-integration)

## Requirements

{%inlineCallout icon="description" bold="OpenMetadata 0.12 or later" href="/deployment"%}
To deploy OpenMetadata, check the Deployment guides.
{%/inlineCallout%}

To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
custom Airflow plugins to handle the workflow deployment.

### Python Requirements

To run the Impala ingestion, you will need to install:

```bash
pip3 install "openmetadata-ingestion[impala]"
```

## Metadata Ingestion

All connectors are defined as JSON Schemas.
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/impalaConnection.json)
you can find the structure to create a connection to Impala.

In order to create and run a Metadata Ingestion workflow, we will follow
the steps to create a YAML configuration able to connect to the source,
process the Entities if needed, and reach the OpenMetadata server.

The workflow is modeled around the following
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json)

### 1. Define the YAML Config

This is a sample config for Impala:

```yaml
source:
  type: impala
  serviceName: local_impala
  serviceConnection:
    config:
      type: Impala
      username: <username>
      password: <password>
      authOptions: <auth options>
      authMechanism: PLAIN # NOSASL, PLAIN, GSSAPI, LDAP, JWT
      hostPort: <impala connection host & port>
      # kerberosServiceName: KerberosServiceName
      # databaseSchema: Database Schema of the data source
      # databaseName: Optional name to give to the database in OpenMetadata.
      # useSSL: true / false
  sourceConfig:
    config:
      type: DatabaseMetadata
      markDeletedTables: true
      includeTables: true
      includeViews: true
      # includeTags: true
      # databaseFilterPattern:
      #   includes:
      #     - database1
      #     - database2
      #   excludes:
      #     - database3
      #     - database4
      # schemaFilterPattern:
      #   includes:
      #     - schema1
      #     - schema2
      #   excludes:
      #     - schema3
      #     - schema4
      # tableFilterPattern:
      #   includes:
      #     - table1
      #     - table2
      #   excludes:
      #     - table3
      #     - table4
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  # loggerLevel: DEBUG  # DEBUG, INFO, WARN or ERROR
  openMetadataServerConfig:
    hostPort: "<OpenMetadata host and port>"
    authProvider: "<OpenMetadata auth provider>"
```

#### Source Configuration - Service Connection

- **username**: Specify the User to connect to Impala. It should have enough privileges to read all the metadata.
- **password**: Password to connect to Impala.
- **hostPort**: Enter the fully qualified hostname and port number for your Impala deployment in the Host and Port field.
- **authMechanism**: This parameter specifies the authentication method to use when connecting to the Impala server. Possible values are `NOSASL`, `PLAIN`, `GSSAPI`, `LDAP`, `JWT`. If you are using Kerberos authentication, you should set auth to `GSSAPI`. 
- **kerberosServiceName**: This parameter specifies the Kerberos service name to use for authentication. This should only be specified if using Kerberos authentication.
- **databaseSchema**: Schema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single schema. When left blank, OpenMetadata Ingestion attempts to scan all the schemas.
- **databaseName**: In OpenMetadata, the Database Service hierarchy works as follows:
`Database Service > Database > Schema > Table`. In the case of Impala, we won't have a Database as such. If you'd like to see your data in a database named something other than `default`, you can specify the name in this field.
- **useSSL**: Establish secure connection with Impala. Enables SSL for the connector.
- **authOptions**: Enter the auth options string for impala connection.
- **Connection Options (Optional)**: Enter the details for any additional connection options that can be sent to Impala during the connection. These details must be added as Key-Value pairs.
- **Connection Arguments (Optional)**: Enter the details for any additional connection arguments such as security or protocol configs that can be sent to Impala during the connection. These details must be added as Key-Value pairs. 
  - In case you are using Single-Sign-On (SSO) for authentication, add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows: `"authenticator" : "sso_login_url"`


#### Source Configuration - Source Config

The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json):

- `markDeletedTables`: To flag tables as soft-deleted if they are not present anymore in the source system.
- `includeTables`: true or false, to ingest table data. Default is true.
- `includeViews`: true or false, to ingest views definitions.
- `databaseFilterPattern`, `schemaFilterPattern`, `tableFilternPattern`: Note that the they support regex as include or exclude. E.g.,

```yaml
tableFilterPattern:
  includes:
    - users
    - type_test
```

#### Sink Configuration

To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.

#### Workflow Configuration

The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.

For a simple, local installation using our docker containers, this looks like:

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: openmetadata
    securityConfig:
      jwtToken: '{bot_jwt_token}'
```

We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-spec/src/main/resources/json/schema/security/client).
You can find the different implementation of the ingestion below.

### Openmetadata JWT Auth

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: openmetadata
    securityConfig:
      jwtToken: '{bot_jwt_token}'
```

### Auth0 SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: auth0
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

### Azure SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: azure
    securityConfig:
      clientSecret: '{your_client_secret}'
      authority: '{your_authority_url}'
      clientId: '{your_client_id}'
      scopes:
        - your_scopes
```

### Custom OIDC SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: custom-oidc
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

### Google SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: google
    securityConfig:
      secretKey: '{path-to-json-creds}'
```

### Okta SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: http://localhost:8585/api
    authProvider: okta
    securityConfig:
      clientId: "{CLIENT_ID - SPA APP}"
      orgURL: "{ISSUER_URL}/v1/token"
      privateKey: "{public/private keypair}"
      email: "{email}"
      scopes:
        - token
```

### Amazon Cognito SSO

The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: auth0
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

### OneLogin SSO

Which uses Custom OIDC for the ingestion

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: custom-oidc
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

### KeyCloak SSO

Which uses Custom OIDC for the ingestion

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: custom-oidc
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```


### 2. Run with the CLI

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:

```bash
metadata ingest -c <path-to-yaml>
```

Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
you will be able to extract metadata from different sources.

## Data Profiler

The Data Profiler workflow will be using the `orm-profiler` processor.
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
updated from previous configurations.

### 1. Define the YAML Config

This is a sample config for the profiler:

```yaml
source:
  type: impala
  serviceName: local_impala
  serviceConnection:
    config:
      type: Impala
      username: <username>
      password: <password>
      authOptions: <auth options>
      authMechanism: PLAIN # NOSASL, PLAIN, GSSAPI, LDAP, JWT
      hostPort: <impala connection host & port>
      # kerberosServiceName: KerberosServiceName
      # databaseSchema: Database Schema of the data source
      # databaseName: Optional name to give to the database in OpenMetadata.
      # useSSL: true / false
  sourceConfig:
    config:
      type: Profiler
      # generateSampleData: true
      # profileSample: 85
      # threadCount: 5 (default)
      # databaseFilterPattern:
      #   includes:
      #     - database1
      #     - database2
      #   excludes:
      #     - database3
      #     - database4
      # schemaFilterPattern:
      #   includes:
      #     - schema1
      #     - schema2
      #   excludes:
      #     - schema3
      #     - schema4
      # tableFilterPattern:
      #   includes:
      #     - table1
      #     - table2
      #   excludes:
      #     - table3
      #     - table4
processor:
  type: orm-profiler
  config: {}  # Remove braces if adding properties
  # tableConfig:
  #   - fullyQualifiedName: <table fqn>
  #     profileSample: <number between 0 and 99> # default will be 100 if omitted
  #     profileQuery: <query to use for sampling data for the profiler>
  #     columnConfig:
  #       excludeColumns:
  #         - <column name>
  #       includeColumns:
  #         - columnName: <column name>
  #         - metrics:
  #           - MEAN
  #           - MEDIAN
  #           - ...
  #     partitionConfig:
  #       enablePartitioning: <set to true to use partitioning>
  #       partitionColumnName: <partition column name. Must be a timestamp or datetime/date field type>
  #       partitionInterval: <partition interval>
  #       partitionIntervalUnit: <YEAR, MONTH, DAY, HOUR>
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  # loggerLevel: DEBUG  # DEBUG, INFO, WARN or ERROR
  openMetadataServerConfig:
    hostPort: "<OpenMetadata host and port>"
    authProvider: "<OpenMetadata auth provider>"
```

#### Source Configuration

- You can find all the definitions and types for the `serviceConnection` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/impalaConnection.json).
- The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json).

Note that the filter patterns support regex as includes or excludes. E.g.,

```yaml
tableFilterPattern:
  includes:
  - *users$
```

#### Processor

Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI:

```yaml
processor:
  type: orm-profiler
  config:
    tableConfig:
      - fullyQualifiedName: <table fqn>
        profileSample: <number between 0 and 99>
        partitionConfig:
          partitionField: <field to use as a partition field>
          partitionQueryDuration: <for date/datetime partitioning based set the offset from today>
          partitionValues: <values to uses as a predicate for the query>
        profileQuery: <query to use for sampling data for the profiler>
        columnConfig:
          excludeColumns:
            - <column name>
          includeColumns:
            - columnName: <column name>
            - metrics:
                - MEAN
                - MEDIAN
                - ...
```

`tableConfig` allows you to set up some configuration at the table level.
All the properties are optional. `metrics` should be one of the metrics listed [here](https://docs.open-metadata.org/openmetadata/ingestion/workflows/profiler/metrics)

#### Workflow Configuration

The same as the metadata ingestion.

### 2. Run with the CLI

After saving the YAML config, we will run the command the same way we did for the metadata ingestion:

```bash
metadata profile -c <path-to-yaml>
```

Note how instead of running `ingest`, we are using the `profile` command to select the Profiler workflow.

## dbt Integration

You can learn more about how to ingest dbt models' definitions and their lineage [here](/connectors/ingestion/workflows/dbt).
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30			`---`
			`title: Run Impala Connector using the CLI`
			`slug: /connectors/database/impala/cli`
			`---`

			`# Run Impala using the metadata CLI`
Fix Docs (#11537) 2023-05-10 22:50:36 +05:30			`{% multiTablesWrapper %}`
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30
Fix Docs (#11537) 2023-05-10 22:50:36 +05:30			`\| Stage \| BETA \|`
			`\|--------------------\|---------------------\|`
			`\| Metadata \| ✅ \|`
			`\| Query Usage \| ❌ \|`
			`\| Data Profiler \| ✅ \|`
			`\| Data Quality \| ✅ \|`
			`\| Lineage \| Partially via Views \|`
			`\| DBT \| ❌ \|`
			`\| Supported Versions \| Impala >= 2.0 \|`
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30
Fix Docs (#11537) 2023-05-10 22:50:36 +05:30			`{% /multiTablesWrapper %}`
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30
			`\| Lineage \| Table-level \| Column-level \|`
			`\|:------:\|:-----------:\|:-------------:\|`
			`\| Partially via Views \| ✅ \| ✅ \|`

			`In this section, we provide guides and references to use the Impala connector.`

			`Configure and schedule Impala metadata and profiler workflows from the OpenMetadata UI:`
			`- [Requirements](#requirements)`
			`- [Metadata Ingestion](#metadata-ingestion)`
			`- [Data Profiler](#data-profiler)`
			`- [dbt Integration](#dbt-integration)`

			`## Requirements`

Fix Docs (#11537) 2023-05-10 22:50:36 +05:30			`{%inlineCallout icon="description" bold="OpenMetadata 0.12 or later" href="/deployment"%}`
			`To deploy OpenMetadata, check the Deployment guides.`
			`{%/inlineCallout%}`
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30
			`To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with`
			`custom Airflow plugins to handle the workflow deployment.`

			`### Python Requirements`

			`To run the Impala ingestion, you will need to install:`

			```bash
[Docs] - SSO updates & Connectors workflow config (#12241) * Rename docs and clean SSO * Add connector partials * Add connector partials * Rename path 2023-06-30 12:25:11 +02:00			`pip3 install "openmetadata-ingestion[impala]"`
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30			```

			`## Metadata Ingestion`

			`All connectors are defined as JSON Schemas.`
			`[Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/impalaConnection.json)`
			`you can find the structure to create a connection to Impala.`

			`In order to create and run a Metadata Ingestion workflow, we will follow`
			`the steps to create a YAML configuration able to connect to the source,`
			`process the Entities if needed, and reach the OpenMetadata server.`

			`The workflow is modeled around the following`
			`[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json)`

			`### 1. Define the YAML Config`

			`This is a sample config for Impala:`

			```yaml
			`source:`
			`type: impala`
			`serviceName: local_impala`
			`serviceConnection:`
			`config:`
			`type: Impala`
			`username: <username>`
			`password: <password>`
			`authOptions: <auth options>`
			`authMechanism: PLAIN # NOSASL, PLAIN, GSSAPI, LDAP, JWT`
			`hostPort: <impala connection host & port>`
Fix Docs - add iam based rds doc (#12210) * Fix Docs * Fix Yaml * Update ingestion/Dockerfile.ci * Add 1.1.0 changes for impala --------- Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com> 2023-06-29 18:53:36 +05:30			`# kerberosServiceName: KerberosServiceName`
			`# databaseSchema: Database Schema of the data source`
			`# databaseName: Optional name to give to the database in OpenMetadata.`
			`# useSSL: true / false`
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30			`sourceConfig:`
			`config:`
			`type: DatabaseMetadata`
			`markDeletedTables: true`
			`includeTables: true`
			`includeViews: true`
			`# includeTags: true`
			`# databaseFilterPattern:`
			`# includes:`
			`# - database1`
			`# - database2`
			`# excludes:`
			`# - database3`
			`# - database4`
			`# schemaFilterPattern:`
			`# includes:`
			`# - schema1`
			`# - schema2`
			`# excludes:`
			`# - schema3`
			`# - schema4`
			`# tableFilterPattern:`
			`# includes:`
			`# - table1`
			`# - table2`
			`# excludes:`
			`# - table3`
			`# - table4`
			`sink:`
			`type: metadata-rest`
			`config: {}`
			`workflowConfig:`
			`# loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR`
			`openMetadataServerConfig:`
			`hostPort: "<OpenMetadata host and port>"`
			`authProvider: "<OpenMetadata auth provider>"`
			```

			`#### Source Configuration - Service Connection`

			`- username: Specify the User to connect to Impala. It should have enough privileges to read all the metadata.`
			`- password: Password to connect to Impala.`
			`- hostPort: Enter the fully qualified hostname and port number for your Impala deployment in the Host and Port field.`
			- authMechanism: This parameter specifies the authentication method to use when connecting to the Impala server. Possible values are `NOSASL`, `PLAIN`, `GSSAPI`, `LDAP`, `JWT`. If you are using Kerberos authentication, you should set auth to `GSSAPI`.
			`- kerberosServiceName: This parameter specifies the Kerberos service name to use for authentication. This should only be specified if using Kerberos authentication.`
			`- databaseSchema: Schema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single schema. When left blank, OpenMetadata Ingestion attempts to scan all the schemas.`
			`- databaseName: In OpenMetadata, the Database Service hierarchy works as follows:`
			`Database Service > Database > Schema > Table`. In the case of Impala, we won't have a Database as such. If you'd like to see your data in a database named something other than `default`, you can specify the name in this field.
			`- useSSL: Establish secure connection with Impala. Enables SSL for the connector.`
			`- authOptions: Enter the auth options string for impala connection.`
			`- Connection Options (Optional): Enter the details for any additional connection options that can be sent to Impala during the connection. These details must be added as Key-Value pairs.`
			`- Connection Arguments (Optional): Enter the details for any additional connection arguments such as security or protocol configs that can be sent to Impala during the connection. These details must be added as Key-Value pairs.`
			- In case you are using Single-Sign-On (SSO) for authentication, add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows: `"authenticator" : "sso_login_url"`
removed externalbrowser docs (#12211) 2023-06-28 20:31:50 +05:30
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30
			`#### Source Configuration - Source Config`

			The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json):

			- `markDeletedTables`: To flag tables as soft-deleted if they are not present anymore in the source system.
			- `includeTables`: true or false, to ingest table data. Default is true.
			- `includeViews`: true or false, to ingest views definitions.
			- `databaseFilterPattern`, `schemaFilterPattern`, `tableFilternPattern`: Note that the they support regex as include or exclude. E.g.,

			```yaml
			`tableFilterPattern:`
			`includes:`
			`- users`
			`- type_test`
			```

			`#### Sink Configuration`

			To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.

			`#### Workflow Configuration`

			The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.

			`For a simple, local installation using our docker containers, this looks like:`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: openmetadata`
			`securityConfig:`
			`jwtToken: '{bot_jwt_token}'`
			```

			`We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-spec/src/main/resources/json/schema/security/client).`
			`You can find the different implementation of the ingestion below.`

			`### Openmetadata JWT Auth`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: openmetadata`
			`securityConfig:`
			`jwtToken: '{bot_jwt_token}'`
			```

			`### Auth0 SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: auth0`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`### Azure SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: azure`
			`securityConfig:`
			`clientSecret: '{your_client_secret}'`
			`authority: '{your_authority_url}'`
			`clientId: '{your_client_id}'`
			`scopes:`
			`- your_scopes`
			```

			`### Custom OIDC SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: custom-oidc`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`### Google SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: google`
			`securityConfig:`
			`secretKey: '{path-to-json-creds}'`
			```

			`### Okta SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: http://localhost:8585/api`
			`authProvider: okta`
			`securityConfig:`
			`clientId: "{CLIENT_ID - SPA APP}"`
			`orgURL: "{ISSUER_URL}/v1/token"`
			`privateKey: "{public/private keypair}"`
			`email: "{email}"`
			`scopes:`
			`- token`
			```

			`### Amazon Cognito SSO`

			`The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: auth0`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`### OneLogin SSO`

			`Which uses Custom OIDC for the ingestion`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: custom-oidc`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`### KeyCloak SSO`

			`Which uses Custom OIDC for the ingestion`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: custom-oidc`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```


			`### 2. Run with the CLI`

			`First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:`

			```bash
			`metadata ingest -c <path-to-yaml>`
			```

			`Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,`
			`you will be able to extract metadata from different sources.`

			`## Data Profiler`

			The Data Profiler workflow will be using the `orm-profiler` processor.
			While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
			`updated from previous configurations.`

			`### 1. Define the YAML Config`

			`This is a sample config for the profiler:`

			```yaml
			`source:`
			`type: impala`
			`serviceName: local_impala`
			`serviceConnection:`
			`config:`
			`type: Impala`
			`username: <username>`
			`password: <password>`
			`authOptions: <auth options>`
			`authMechanism: PLAIN # NOSASL, PLAIN, GSSAPI, LDAP, JWT`
			`hostPort: <impala connection host & port>`
Fix Docs - add iam based rds doc (#12210) * Fix Docs * Fix Yaml * Update ingestion/Dockerfile.ci * Add 1.1.0 changes for impala --------- Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com> 2023-06-29 18:53:36 +05:30			`# kerberosServiceName: KerberosServiceName`
			`# databaseSchema: Database Schema of the data source`
			`# databaseName: Optional name to give to the database in OpenMetadata.`
			`# useSSL: true / false`
Fix impala (#11535) * Add impala docs * Add impala * remove externalbrowser * Add Config doc * Add config options * Add config options 2023-05-10 21:46:49 +05:30			`sourceConfig:`
			`config:`
			`type: Profiler`
			`# generateSampleData: true`
			`# profileSample: 85`
			`# threadCount: 5 (default)`
			`# databaseFilterPattern:`
			`# includes:`
			`# - database1`
			`# - database2`
			`# excludes:`
			`# - database3`
			`# - database4`
			`# schemaFilterPattern:`
			`# includes:`
			`# - schema1`
			`# - schema2`
			`# excludes:`
			`# - schema3`
			`# - schema4`
			`# tableFilterPattern:`
			`# includes:`
			`# - table1`
			`# - table2`
			`# excludes:`
			`# - table3`
			`# - table4`
			`processor:`
			`type: orm-profiler`
			`config: {} # Remove braces if adding properties`
			`# tableConfig:`
			`# - fullyQualifiedName: <table fqn>`
			`# profileSample: <number between 0 and 99> # default will be 100 if omitted`
			`# profileQuery: <query to use for sampling data for the profiler>`
			`# columnConfig:`
			`# excludeColumns:`
			`# - <column name>`
			`# includeColumns:`
			`# - columnName: <column name>`
			`# - metrics:`
			`# - MEAN`
			`# - MEDIAN`
			`# - ...`
			`# partitionConfig:`
			`# enablePartitioning: <set to true to use partitioning>`
			`# partitionColumnName: <partition column name. Must be a timestamp or datetime/date field type>`
			`# partitionInterval: <partition interval>`
			`# partitionIntervalUnit: <YEAR, MONTH, DAY, HOUR>`
			`sink:`
			`type: metadata-rest`
			`config: {}`
			`workflowConfig:`
			`# loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR`
			`openMetadataServerConfig:`
			`hostPort: "<OpenMetadata host and port>"`
			`authProvider: "<OpenMetadata auth provider>"`
			```

			`#### Source Configuration`

			- You can find all the definitions and types for the `serviceConnection` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/impalaConnection.json).
			- The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json).

			`Note that the filter patterns support regex as includes or excludes. E.g.,`

			```yaml
			`tableFilterPattern:`
			`includes:`
			`- *users$`
			```

			`#### Processor`

			Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI:

			```yaml
			`processor:`
			`type: orm-profiler`
			`config:`
			`tableConfig:`
			`- fullyQualifiedName: <table fqn>`
			`profileSample: <number between 0 and 99>`
			`partitionConfig:`
			`partitionField: <field to use as a partition field>`
			`partitionQueryDuration: <for date/datetime partitioning based set the offset from today>`
			`partitionValues: <values to uses as a predicate for the query>`
			`profileQuery: <query to use for sampling data for the profiler>`
			`columnConfig:`
			`excludeColumns:`
			`- <column name>`
			`includeColumns:`
			`- columnName: <column name>`
			`- metrics:`
			`- MEAN`
			`- MEDIAN`
			`- ...`
			```

			`tableConfig` allows you to set up some configuration at the table level.
			All the properties are optional. `metrics` should be one of the metrics listed [here](https://docs.open-metadata.org/openmetadata/ingestion/workflows/profiler/metrics)

			`#### Workflow Configuration`

			`The same as the metadata ingestion.`

			`### 2. Run with the CLI`

			`After saving the YAML config, we will run the command the same way we did for the metadata ingestion:`

			```bash
			`metadata profile -c <path-to-yaml>`
			```

			Note how instead of running `ingest`, we are using the `profile` command to select the Profiler workflow.

			`## dbt Integration`

			`You can learn more about how to ingest dbt models' definitions and their lineage [here](/connectors/ingestion/workflows/dbt).`