OpenMetadata/openmetadata-docs/content/connectors/database/datalake/cli.md

---
title: Run Datalake Connector using the CLI
slug: /connectors/database/datalake/cli
---

# Run Datalake using the metadata CLI
<Table>

| Stage | Metadata |Query Usage | Data Profiler | Data Quality | Lineage | DBT | Supported Versions |
|:------:|:------:|:-----------:|:-------------:|:------------:|:-------:|:---:|:------------------:|
|  PROD  |   ✅   |      ❌      |       ✅       |       ✅      |    ❌    |  ❌  |  --  |

</Table>

<Table>

| Lineage | Table-level | Column-level |
|:------:|:-----------:|:-------------:|
| ❌ | ❌ | ❌ |

</Table>

In this section, we provide guides and references to use the Datalake connector.

Configure and schedule Datalake metadata and profiler workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [dbt Integration](#dbt-integration)

## Requirements

<InlineCallout color="violet-70" icon="description" bold="OpenMetadata 0.12 or later" href="/deployment">
To deploy OpenMetadata, check the <a href="/deployment">Deployment</a> guides.
</InlineCallout>

To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
custom Airflow plugins to handle the workflow deployment.

<Note>

Datalake connector supports extracting metadata from file types `JSON`, `CSV`, `TSV` & `Parquet`.

</Note>

** S3 Permissions **

<p> To execute metadata extraction AWS account should have enough access to fetch required data. The <strong>Bucket Policy</strong> in AWS requires at least these permissions: </p>

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<my bucket>",
                "arn:aws:s3:::<my bucket>/*"
            ]
        }
    ]
}
```

### Python Requirements

If running OpenMetadata version greater than 0.13, you will need to install the Datalake ingestion for GCS or S3:

#### S3 installation

```bash
pip3 install "openmetadata-ingestion[datalake-s3]"
```

#### GCS installation

```bash
pip3 install "openmetadata-ingestion[datalake-gcs]"
```

#### Azure installation

```bash
pip3 install "openmetadata-ingestion[datalake-azure]"
```

#### If version <0.13

You will be installing the requirements together for S3 and GCS

```bash
pip3 install "openmetadata-ingestion[datalake]"
```

## Metadata Ingestion
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Datalake.

In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server.

The workflow is modeled around the following JSON Schema.

## 1. Define the YAML Config

#### Source Configuration - Source Config using AWS S3

This is a sample config for Datalake using AWS S3:

```yaml

source:
  type: datalake
  serviceName: local_datalake
  serviceConnection:
    config:
      type: Datalake
      configSource:      
        securityConfig: 
          awsAccessKeyId: aws access key id
          awsSecretAccessKey: aws secret access key
          awsRegion: aws region
      bucketName: bucket name
      prefix: prefix
  sourceConfig:
    type: DatabaseMetadata
    config:
      tableFilterPattern:
        includes:
        - ''
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  # loggerLevel: DEBUG  # DEBUG, INFO, WARN or ERROR
  openMetadataServerConfig:
    hostPort: <OpenMetadata host and port>
    authProvider: <OpenMetadata auth provider>

```

The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).

* **awsAccessKeyId**: Enter your secure access key ID for your DynamoDB connection. The specified key ID should be authorized to read all databases you want to include in the metadata ingestion workflow.
* **awsSecretAccessKey**: Enter the Secret Access Key (the passcode key pair to the key ID from above).
* **awsRegion**: Specify the region in which your DynamoDB is located. This setting is required even if you have configured a local AWS profile.
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,


#### Source Configuration - Service Connection using GCS

This is a sample config for Datalake using GCS:

```yaml
source:
  type: datalake
  serviceName: local_datalake
  serviceConnection:
    config:
      type: Datalake
      configSource:
        securityConfig:
          gcsConfig:
            type: type of account
            projectId: project id
            privateKeyId: private key id
            privateKey: private key
            clientEmail: client email
            clientId: client id
            authUri: https://accounts.google.com/o/oauth2/auth
            tokenUri: https://oauth2.googleapis.com/token
            authProviderX509CertUrl: https://www.googleapis.com/oauth2/v1/certs
            clientX509CertUrl:  clientX509 Certificate Url
      bucketName: bucket name
      prefix: prefix
  sourceConfig:
    config:
      tableFilterPattern:
        includes:
          - ''
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  # loggerLevel: DEBUG  # DEBUG, INFO, WARN or ERROR
  openMetadataServerConfig:
    hostPort: <OpenMetadata host and port>
    authProvider: <OpenMetadata auth provider>
```

The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).

* **type**: Credentials type, e.g. `service_account`.
* **projectId**
* **privateKey**
* **privateKeyId**
* **clientEmail**
* **clientId**
* **authUri**: [https://accounts.google.com/o/oauth2/auth](https://accounts.google.com/o/oauth2/auth) by default
* **tokenUri**: [https://oauth2.googleapis.com/token](https://oauth2.googleapis.com/token) by default
* **authProviderX509CertUrl**: [https://www.googleapis.com/oauth2/v1/certs](https://www.googleapis.com/oauth2/v1/certs) by default
* **clientX509CertUrl**
* **bucketName**: name of the bucket in GCS
* **Prefix**: prefix in gcs bucket


#### Source Configuration - Service Connection using Azure

This is a sample config for Datalake using Azure:

```yaml
# Datalake with Azure 

source:
  type: datalake
  serviceName: local_datalake
  serviceConnection:
    config:
      type: Datalake
      configSource:      
        securityConfig: 
          clientId: client-id
          clientSecret: client-secret
          tenantId: tenant-id
          accountName: account-name
      prefix: prefix
  sourceConfig:
    config:
      tableFilterPattern:
        includes:
        - ''
sink:
  type: metadata-rest
  config: {}
workflowConfig:
  openMetadataServerConfig:
    hostPort: <OpenMetadata host and port>
    authProvider: <OpenMetadata auth provider>
```

The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/security/credentials/azureCredentials.json).

- **Client ID** : Client ID of the data storage account
- **Client Secret** : Client Secret of the account
- **Tenant ID** : Tenant ID under which the data storage account falls
- **Account Name** : Account Name of the data Storage

**schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,

#### Source Configuration - Source Config

The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json):

- `markDeletedTables`: To flag tables as soft-deleted if they are not present anymore in the source system.
- `includeTables`: true or false, to ingest table data. Default is true.
- `includeViews`: true or false, to ingest views definitions.
- `databaseFilterPattern`, `schemaFilterPattern`, `tableFilternPattern`: Note that the they support regex as include or exclude. E.g.,

```yaml
tableFilterPattern:
  includes:
    - users
    - type_test
```

#### Sink Configuration

To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.

#### Workflow Configuration

The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.

For a simple, local installation using our docker containers, this looks like:

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: openmetadata
    securityConfig:
      jwtToken: '{bot_jwt_token}'
```

We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-spec/src/main/resources/json/schema/security/client).
You can find the different implementation of the ingestion below.

<Collapse title="Configure SSO in the Ingestion Workflows">

### Openmetadata JWT Auth

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: openmetadata
    securityConfig:
      jwtToken: '{bot_jwt_token}'
```

### Auth0 SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: auth0
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

### Azure SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: azure
    securityConfig:
      clientSecret: '{your_client_secret}'
      authority: '{your_authority_url}'
      clientId: '{your_client_id}'
      scopes:
        - your_scopes
```

### Custom OIDC SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: custom-oidc
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

### Google SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: google
    securityConfig:
      secretKey: '{path-to-json-creds}'
```

### Okta SSO

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: http://localhost:8585/api
    authProvider: okta
    securityConfig:
      clientId: "{CLIENT_ID - SPA APP}"
      orgURL: "{ISSUER_URL}/v1/token"
      privateKey: "{public/private keypair}"
      email: "{email}"
      scopes:
        - token
```

### Amazon Cognito SSO

The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: auth0
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

### OneLogin SSO

Which uses Custom OIDC for the ingestion

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: custom-oidc
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

### KeyCloak SSO

Which uses Custom OIDC for the ingestion

```yaml
workflowConfig:
  openMetadataServerConfig:
    hostPort: 'http://localhost:8585/api'
    authProvider: custom-oidc
    securityConfig:
      clientId: '{your_client_id}'
      secretKey: '{your_client_secret}'
      domain: '{your_domain}'
```

</Collapse>

### 2. Run with the CLI

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:

```bash
metadata ingest -c <path-to-yaml>
```

Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
you will be able to extract metadata from different sources.

## dbt Integration

You can learn more about how to ingest dbt models' definitions and their lineage [here](/connectors/ingestion/workflows/dbt).
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`---`
			`title: Run Datalake Connector using the CLI`
Fix Menu , Connectors should've its own section after deployment (#7950) * Fix Menu * Fix broken links * Fix config values * Fix config values 2022-10-05 21:54:02 -07:00			`slug: /connectors/database/datalake/cli`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`---`

Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`# Run Datalake using the metadata CLI`
Add docs - quicksight, lineage... (#10023) 2023-01-31 20:47:40 +05:30			`<Table>`
Fix Docs (#10035) 2023-01-31 21:26:26 +05:30
Add docs - quicksight, lineage... (#10023) 2023-01-31 20:47:40 +05:30			`\| Stage \| Metadata \|Query Usage \| Data Profiler \| Data Quality \| Lineage \| DBT \| Supported Versions \|`
			`\|:------:\|:------:\|:-----------:\|:-------------:\|:------------:\|:-------:\|:---:\|:------------------:\|`
Fix Docs (#10035) 2023-01-31 21:26:26 +05:30			`\| PROD \| ✅ \| ❌ \| ✅ \| ✅ \| ❌ \| ❌ \| -- \|`

Add docs - quicksight, lineage... (#10023) 2023-01-31 20:47:40 +05:30			`</Table>`
Fix Docs (#10035) 2023-01-31 21:26:26 +05:30
Add docs - quicksight, lineage... (#10023) 2023-01-31 20:47:40 +05:30			`<Table>`
Fix Docs (#10035) 2023-01-31 21:26:26 +05:30
Add docs - quicksight, lineage... (#10023) 2023-01-31 20:47:40 +05:30			`\| Lineage \| Table-level \| Column-level \|`
			`\|:------:\|:-----------:\|:-------------:\|`
Fix Docs (#10035) 2023-01-31 21:26:26 +05:30			`\| ❌ \| ❌ \| ❌ \|`

Add docs - quicksight, lineage... (#10023) 2023-01-31 20:47:40 +05:30			`</Table>`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`In this section, we provide guides and references to use the Datalake connector.`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`Configure and schedule Datalake metadata and profiler workflows from the OpenMetadata UI:`
			`- [Requirements](#requirements)`
			`- [Metadata Ingestion](#metadata-ingestion)`
Added dbt workflow docs (#9493) * Added dbt workflow docs * added dbt small case * Fixed review comments 2022-12-22 18:41:18 +05:30			`- [dbt Integration](#dbt-integration)`
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00
			`## Requirements`

			`<InlineCallout color="violet-70" icon="description" bold="OpenMetadata 0.12 or later" href="/deployment">`
			`To deploy OpenMetadata, check the <a href="/deployment">Deployment</a> guides.`
			`</InlineCallout>`

			`To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with`
			`custom Airflow plugins to handle the workflow deployment.`

Added File Types in Databrick Docs (#8865) 2022-11-20 10:37:48 +05:30			`<Note>`

			Datalake connector supports extracting metadata from file types `JSON`, `CSV`, `TSV` & `Parquet`.

			`</Note>`

Snoflake & S3 permission update in docs (#8296) 2022-10-20 23:26:13 +05:30			` S3 Permissions `

			`<p> To execute metadata extraction AWS account should have enough access to fetch required data. The <strong>Bucket Policy</strong> in AWS requires at least these permissions: </p>`

			```json
			`{`
			`"Version": "2012-10-17",`
			`"Statement": [`
			`{`
			`"Effect": "Allow",`
			`"Action": [`
			`"s3:GetObject",`
			`"s3:ListBucket"`
			`],`
			`"Resource": [`
			`"arn:aws:s3:::<my bucket>",`
			`"arn:aws:s3:::<my bucket>/*"`
			`]`
			`}`
			`]`
			`}`
			```

Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`### Python Requirements`

Fix #8794 - Separate DL requirements and lazy imports (#8806) 2022-11-17 10:11:54 +01:00			`If running OpenMetadata version greater than 0.13, you will need to install the Datalake ingestion for GCS or S3:`

			`#### S3 installation`

			```bash
			`pip3 install "openmetadata-ingestion[datalake-s3]"`
			```

			`#### GCS installation`

			```bash
			`pip3 install "openmetadata-ingestion[datalake-gcs]"`
			```

Fix: updated docs for datalake-azure (#9358) * Fix: updated docs for datalake-azure * Fix: as per comment 2022-12-17 16:22:03 +05:30			`#### Azure installation`

			```bash
			`pip3 install "openmetadata-ingestion[datalake-azure]"`
			```

Fix #8794 - Separate DL requirements and lazy imports (#8806) 2022-11-17 10:11:54 +01:00			`#### If version <0.13`

			`You will be installing the requirements together for S3 and GCS`
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00
			```bash
			`pip3 install "openmetadata-ingestion[datalake]"`
			```
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30
			`## Metadata Ingestion`
			`All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Datalake.`

			`In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server.`

			`The workflow is modeled around the following JSON Schema.`

			`## 1. Define the YAML Config`
Add Azure Datalake to the list (#9487) * Add Azure Datalake to the list * Put Yaml configs under the relevant sections 2022-12-22 15:45:57 +05:30
			`#### Source Configuration - Source Config using AWS S3`

datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`This is a sample config for Datalake using AWS S3:`

			```yaml

			`source:`
			`type: datalake`
			`serviceName: local_datalake`
			`serviceConnection:`
			`config:`
			`type: Datalake`
			`configSource:`
			`securityConfig:`
			`awsAccessKeyId: aws access key id`
			`awsSecretAccessKey: aws secret access key`
			`awsRegion: aws region`
			`bucketName: bucket name`
			`prefix: prefix`
			`sourceConfig:`
Doc: Add missing source config types in connectors config examples (#9955) 2023-01-27 15:30:48 +01:00			`type: DatabaseMetadata`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`config:`
			`tableFilterPattern:`
			`includes:`
			`- ''`
			`sink:`
			`type: metadata-rest`
			`config: {}`
			`workflowConfig:`
Docs updates for lineage, loggerLevel, metastore and requirements (#7085) * Python version in requirements * Add lineage sdk * Deltalake metastore * Add loggerLevel 2022-08-31 15:11:11 +02:00			`# loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`openMetadataServerConfig:`
Added OpenMetadata JWT Auth in docs (#7877) 2022-10-03 14:52:32 +05:30			`hostPort: <OpenMetadata host and port>`
			`authProvider: <OpenMetadata auth provider>`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30
			```

Fixes #7661 404 links in documentation (#7700) 2022-09-23 15:09:46 -07:00			The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30
			`* awsAccessKeyId: Enter your secure access key ID for your DynamoDB connection. The specified key ID should be authorized to read all databases you want to include in the metadata ingestion workflow.`
			`* awsSecretAccessKey: Enter the Secret Access Key (the passcode key pair to the key ID from above).`
			`* awsRegion: Specify the region in which your DynamoDB is located. This setting is required even if you have configured a local AWS profile.`
			* schemaFilterPattern and tableFilternPattern: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,

Add Azure Datalake to the list (#9487) * Add Azure Datalake to the list * Put Yaml configs under the relevant sections 2022-12-22 15:45:57 +05:30
			`#### Source Configuration - Service Connection using GCS`

datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`This is a sample config for Datalake using GCS:`

			```yaml
			`source:`
			`type: datalake`
			`serviceName: local_datalake`
			`serviceConnection:`
			`config:`
			`type: Datalake`
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`configSource:`
			`securityConfig:`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`gcsConfig:`
			`type: type of account`
			`projectId: project id`
			`privateKeyId: private key id`
			`privateKey: private key`
			`clientEmail: client email`
			`clientId: client id`
			`authUri: https://accounts.google.com/o/oauth2/auth`
			`tokenUri: https://oauth2.googleapis.com/token`
			`authProviderX509CertUrl: https://www.googleapis.com/oauth2/v1/certs`
			`clientX509CertUrl: clientX509 Certificate Url`
			`bucketName: bucket name`
			`prefix: prefix`
			`sourceConfig:`
			`config:`
			`tableFilterPattern:`
			`includes:`
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`- ''`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`sink:`
			`type: metadata-rest`
			`config: {}`
			`workflowConfig:`
Docs updates for lineage, loggerLevel, metastore and requirements (#7085) * Python version in requirements * Add lineage sdk * Deltalake metastore * Add loggerLevel 2022-08-31 15:11:11 +02:00			`# loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`openMetadataServerConfig:`
Added OpenMetadata JWT Auth in docs (#7877) 2022-10-03 14:52:32 +05:30			`hostPort: <OpenMetadata host and port>`
			`authProvider: <OpenMetadata auth provider>`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			```

Fixes #7661 404 links in documentation (#7700) 2022-09-23 15:09:46 -07:00			The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30
			* type: Credentials type, e.g. `service_account`.
			`* projectId`
Docs - Python requirements & metadata docker (#6790) Docs - Python requirements & metadata docker (#6790) 2022-08-18 11:43:45 +02:00			`* privateKey`
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30			`* privateKeyId`
			`* clientEmail`
			`* clientId`
			`* authUri: [https://accounts.google.com/o/oauth2/auth](https://accounts.google.com/o/oauth2/auth) by default`
			`* tokenUri: [https://oauth2.googleapis.com/token](https://oauth2.googleapis.com/token) by default`
			`* authProviderX509CertUrl: [https://www.googleapis.com/oauth2/v1/certs](https://www.googleapis.com/oauth2/v1/certs) by default`
			`* clientX509CertUrl`
Docs - Python requirements & metadata docker (#6790) Docs - Python requirements & metadata docker (#6790) 2022-08-18 11:43:45 +02:00			`* bucketName: name of the bucket in GCS`
			`* Prefix: prefix in gcs bucket`
Fix: updated docs for datalake-azure (#9358) * Fix: updated docs for datalake-azure * Fix: as per comment 2022-12-17 16:22:03 +05:30
Add Azure Datalake to the list (#9487) * Add Azure Datalake to the list * Put Yaml configs under the relevant sections 2022-12-22 15:45:57 +05:30
			`#### Source Configuration - Service Connection using Azure`

Fix: updated docs for datalake-azure (#9358) * Fix: updated docs for datalake-azure * Fix: as per comment 2022-12-17 16:22:03 +05:30			`This is a sample config for Datalake using Azure:`

			```yaml
			`# Datalake with Azure`

			`source:`
			`type: datalake`
			`serviceName: local_datalake`
			`serviceConnection:`
			`config:`
			`type: Datalake`
			`configSource:`
			`securityConfig:`
			`clientId: client-id`
			`clientSecret: client-secret`
			`tenantId: tenant-id`
			`accountName: account-name`
			`prefix: prefix`
			`sourceConfig:`
			`config:`
			`tableFilterPattern:`
			`includes:`
			`- ''`
			`sink:`
			`type: metadata-rest`
			`config: {}`
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: <OpenMetadata host and port>`
			`authProvider: <OpenMetadata auth provider>`
			```

			The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/security/credentials/azureCredentials.json).

			`- Client ID : Client ID of the data storage account`
			`- Client Secret : Client Secret of the account`
			`- Tenant ID : Tenant ID under which the data storage account falls`
			`- Account Name : Account Name of the data Storage`

			schemaFilterPattern and tableFilternPattern: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
datalake-doc-added (#6672) 2022-08-09 19:10:38 +05:30
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`#### Source Configuration - Source Config`

Fixes #7661 404 links in documentation (#7700) 2022-09-23 15:09:46 -07:00			The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json):
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00
			- `markDeletedTables`: To flag tables as soft-deleted if they are not present anymore in the source system.
			- `includeTables`: true or false, to ingest table data. Default is true.
			- `includeViews`: true or false, to ingest views definitions.
			- `databaseFilterPattern`, `schemaFilterPattern`, `tableFilternPattern`: Note that the they support regex as include or exclude. E.g.,

			```yaml
			`tableFilterPattern:`
			`includes:`
			`- users`
			`- type_test`
			```

			`#### Sink Configuration`

			To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.

			`#### Workflow Configuration`

			The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.

			`For a simple, local installation using our docker containers, this looks like:`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
Added OpenMetadata JWT Auth in docs (#7877) 2022-10-03 14:52:32 +05:30			`hostPort: 'http://localhost:8585/api'`
			`authProvider: openmetadata`
			`securityConfig:`
			`jwtToken: '{bot_jwt_token}'`
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			```

Fixes #7661 404 links in documentation (#7700) 2022-09-23 15:09:46 -07:00			`We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-spec/src/main/resources/json/schema/security/client).`
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`You can find the different implementation of the ingestion below.`

			`<Collapse title="Configure SSO in the Ingestion Workflows">`

Added OpenMetadata JWT Auth in docs (#7877) 2022-10-03 14:52:32 +05:30			`### Openmetadata JWT Auth`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: openmetadata`
			`securityConfig:`
			`jwtToken: '{bot_jwt_token}'`
			```

Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00			`### Auth0 SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: auth0`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`### Azure SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: azure`
			`securityConfig:`
			`clientSecret: '{your_client_secret}'`
			`authority: '{your_authority_url}'`
			`clientId: '{your_client_id}'`
			`scopes:`
			`- your_scopes`
			```

			`### Custom OIDC SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: custom-oidc`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`### Google SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: google`
			`securityConfig:`
			`secretKey: '{path-to-json-creds}'`
			```

			`### Okta SSO`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: http://localhost:8585/api`
			`authProvider: okta`
			`securityConfig:`
			`clientId: "{CLIENT_ID - SPA APP}"`
			`orgURL: "{ISSUER_URL}/v1/token"`
			`privateKey: "{public/private keypair}"`
			`email: "{email}"`
			`scopes:`
			`- token`
			```

			`### Amazon Cognito SSO`

			`The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: auth0`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`### OneLogin SSO`

			`Which uses Custom OIDC for the ingestion`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: custom-oidc`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`### KeyCloak SSO`

			`Which uses Custom OIDC for the ingestion`

			```yaml
			`workflowConfig:`
			`openMetadataServerConfig:`
			`hostPort: 'http://localhost:8585/api'`
			`authProvider: custom-oidc`
			`securityConfig:`
			`clientId: '{your_client_id}'`
			`secretKey: '{your_client_secret}'`
			`domain: '{your_domain}'`
			```

			`</Collapse>`

			`### 2. Run with the CLI`

			`First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:`

			```bash
			`metadata ingest -c <path-to-yaml>`
			```

			`Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,`
			`you will be able to extract metadata from different sources.`

Added dbt workflow docs (#9493) * Added dbt workflow docs * added dbt small case * Fixed review comments 2022-12-22 18:41:18 +05:30			`## dbt Integration`
Docs - Markdown Migration (#6980) 2022-08-27 02:57:09 +02:00
Added dbt workflow docs (#9493) * Added dbt workflow docs * added dbt small case * Fixed review comments 2022-12-22 18:41:18 +05:30			`You can learn more about how to ingest dbt models' definitions and their lineage [here](/connectors/ingestion/workflows/dbt).`