<p> To execute metadata extraction and usage workflow successfully the user or the service account should have enough access to fetch required data. Following table describes the minimum required permissions </p>
- **username**: (Optional) Specify the User to connect to BigQuery. It should have enough privileges to read all the metadata.
- **projectID**: (Optional) The BigQuery Project ID is required only if the credentials path is being used instead of values.
- **credentials**: We support two ways of authenticating to BigQuery inside **gcsConfig**
1. Passing the raw credential values provided by BigQuery. This requires us to provide the following information, all provided by BigQuery:
- **type**, e.g., `service_account`
- **projectId**
- **privateKey**
- **privateKeyId**
- **clientEmail**
- **clientId**
- **authUri**, https://accounts.google.com/o/oauth2/auth by defaul
- **tokenUri**, https://oauth2.googleapis.com/token by default
- **authProviderX509CertUrl**, https://www.googleapis.com/oauth2/v1/certs by default
- **clientX509CertUrl**
2. Passing a local file path that contains the credentials:
- **gcsCredentialsPath**
If you prefer to pass the credentials file, you can do so as follows:
```yaml
credentials:
gcsConfig:
gcsCredentialsPath: <pathtofile>
```
- **Enable Policy Tag Import (Optional)**: Mark as 'True' to enable importing policy tags from BigQuery to OpenMetadata.
- **Tag Category Name (Optional)**: If the Tag import is enabled, the name of the Tag Category will be created at OpenMetadata.
- **Database (Optional)**: The database of the data source is an optional parameter, if you would like to restrict the metadata reading to a single database. If left blank, OpenMetadata ingestion attempts to scan all the databases.
- **Connection Options (Optional)**: Enter the details for any additional connection options that can be sent to BigQuery during the connection. These details must be added as Key-Value pairs.
- **Connection Arguments (Optional)**: Enter the details for any additional connection arguments such as security or protocol configs that can be sent to BigQuery during the connection. These details must be added as Key-Value pairs.
- In case you are using Single-Sign-On (SSO) for authentication, add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows: `"authenticator" : "sso_login_url"`
- In case you authenticate with SSO using an external browser popup, then add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows: `"authenticator" : "externalbrowser"`
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json):
-`markDeletedTables`: To flag tables as soft-deleted if they are not present anymore in the source system.
-`includeTables`: true or false, to ingest table data. Default is true.
-`includeViews`: true or false, to ingest views definitions.
-`databaseFilterPattern`, `schemaFilterPattern`, `tableFilternPattern`: Note that the they support regex as include or exclude. E.g.,
```yaml
tableFilterPattern:
includes:
- users
- type_test
```
#### Sink Configuration
To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.
#### Workflow Configuration
The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.
For a simple, local installation using our docker containers, this looks like:
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: no-auth
```
We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/catalog-rest-service/src/main/resources/json/schema/security/client).
You can find the different implementation of the ingestion below.
<Collapsetitle="Configure SSO in the Ingestion Workflows">
### Auth0 SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: auth0
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### Azure SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: azure
securityConfig:
clientSecret: '{your_client_secret}'
authority: '{your_authority_url}'
clientId: '{your_client_id}'
scopes:
- your_scopes
```
### Custom OIDC SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### Google SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: google
securityConfig:
secretKey: '{path-to-json-creds}'
```
### Okta SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: okta
securityConfig:
clientId: "{CLIENT_ID - SPA APP}"
orgURL: "{ISSUER_URL}/v1/token"
privateKey: "{public/private keypair}"
email: "{email}"
scopes:
- token
```
### Amazon Cognito SSO
The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: auth0
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### OneLogin SSO
Which uses Custom OIDC for the ingestion
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### KeyCloak SSO
Which uses Custom OIDC for the ingestion
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
</Collapse>
### 2. Run with the CLI
First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
```bash
metadata ingest -c <path-to-yaml>
```
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
you will be able to extract metadata from different sources.
## Query Usage and Lineage Ingestion
To ingest the Query Usage and Lineage information, the `serviceConnection` configuration will remain the same.
However, the `sourceConfig` is now modeled after this JSON Schema.
You can find all the definitions and types for the `serviceConnection` [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/services/connections/database/bigQueryConnection.json).
They are the same as metadata ingestion.
#### Source Configuration - Source Config
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json).
-`queryLogDuration`: Configuration to tune how far we want to look back in query logs to process usage data.
-`resultLimit`: Configuration to set the limit for query logs
#### Processor, Stage and Bulk Sink
To specify where the staging files will be located.
Note that the location is a directory that will be cleaned at the end of the ingestion.
#### Workflow Configuration
The same as the metadata ingestion.
### 2. Run with the CLI
There is an extra requirement to run the Usage pipelines. You will need to install:
- You can find all the definitions and types for the `serviceConnection` [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/services/connections/database/bigqueryConnection.json).
- The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json).
Note that the filter patterns support regex as includes or excludes. E.g.,
```yaml
tableFilterPattern:
includes:
- *users$
```
#### Processor
Choose the `orm-profiler`. Its config can also be updated to define tests from the YAML itself instead of the UI:
`tableConfig` allows you to set up some configuration at the table level.
All the properties are optional. `metrics` should be one of the metrics listed [here](https://docs.open-metadata.org/openmetadata/ingestion/workflows/profiler/metrics)
#### Workflow Configuration
The same as the metadata ingestion.
### 2. Run with the CLI
After saving the YAML config, we will run the command the same way we did for the metadata ingestion:
```bash
metadata profile -c <path-to-yaml>
```
Note how instead of running `ingest`, we are using the `profile` command to select the Profiler workflow.
## DBT Integration
You can learn more about how to ingest DBT models' definitions and their lineage [here](https://docs.open-metadata.org/openmetadata/ingestion/workflows/metadata/dbt).