- **hostPort**: Enter the fully qualified hostname and port number for your Databricks deployment in the Host and Port field.
- **token**: Generated Token to connect to Databricks.
- **httpPath**: Databricks compute resources URL.
- **Connection Options (Optional)**: Enter the details for any additional connection options that can be sent to Databricks during the connection. These details must be added as Key-Value pairs.
- **Connection Arguments (Optional)**: Enter the details for any additional connection arguments such as security or protocol configs that can be sent to Databricks during the connection. These details must be added as Key-Value pairs.
- In case you are using Single-Sign-On (SSO) for authentication, add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows: `"authenticator" : "sso_login_url"`
- In case you authenticate with SSO using an external browser popup, then add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows: `"authenticator" : "externalbrowser"`
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json):
We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-spec/src/main/resources/json/schema/security/client).
# This is a directory that will be DELETED after the usage runs
stageFileLocation: <pathtostorethestagefile>
# resultLimit: 1000
# If instead of getting the query logs from the database we want to pass a file with the queries
# queryLogFilePath: path-to-file
processor:
type: query-parser
config: {}
stage:
type: table-usage
config:
filename: /tmp/databricks_usage
bulkSink:
type: metadata-usage
config:
filename: /tmp/databricks_usage
workflowConfig:
# loggerLevel: DEBUG # DEBUG, INFO, WARN or ERROR
openMetadataServerConfig:
hostPort: <OpenMetadatahostandport>
authProvider: <OpenMetadataauthprovider>
```
#### Source Configuration - Service Connection
You can find all the definitions and types for the `serviceConnection` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/databricksConnection.json).
They are the same as metadata ingestion.
#### Source Configuration - Source Config
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryUsagePipeline.json).
-`queryLogDuration`: Configuration to tune how far we want to look back in query logs to process usage data.
-`resultLimit`: Configuration to set the limit for query logs
#### Processor, Stage and Bulk Sink
To specify where the staging files will be located.
Note that the location is a directory that will be cleaned at the end of the ingestion.
#### Workflow Configuration
The same as the metadata ingestion.
### 2. Run with the CLI
There is an extra requirement to run the Usage pipelines. You will need to install:
- You can find all the definitions and types for the `serviceConnection` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/databricksConnection.json).
- The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json).
`tableConfig` allows you to set up some configuration at the table level.
All the properties are optional. `metrics` should be one of the metrics listed [here](https://docs.open-metadata.org/openmetadata/ingestion/workflows/profiler/metrics)
#### Workflow Configuration
The same as the metadata ingestion.
### 2. Run with the CLI
After saving the YAML config, we will run the command the same way we did for the metadata ingestion:
```bash
metadata profile -c <path-to-yaml>
```
Note how instead of running `ingest`, we are using the `profile` command to select the Profiler workflow.
## DBT Integration
You can learn more about how to ingest DBT models' definitions and their lineage [here](https://docs.open-metadata.org/openmetadata/ingestion/workflows/metadata/dbt).