115 lines
8.7 KiB
Markdown

# LookML
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
## Setup
To install this plugin, run `pip install 'acryl-datahub[lookml]'`.
Note! This plugin uses a package that requires Python 3.7+!
## Capabilities
This plugin extracts the following:
- LookML views from model files in a project
- Name, upstream table names, metadata for dimensions, measures, and dimension groups attached as tags
- If API integration is enabled (recommended), resolves table and view names by calling the Looker API, otherwise supports offline resolution of these names.
**_NOTE_:** To get complete Looker metadata integration (including Looker dashboards and charts and lineage to the underlying Looker views, you must ALSO use the Looker source. Documentation for that is [here](./looker.md)
### Configuration Notes
See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.
You need to ensure that the API key is attached to a user that has Admin privileges. If that is not possible, read the configuration section to provide an offline specification of the `connection_to_platform_map` and the `project_name`.
## Quickstart recipe
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
```yml
source:
type: "lookml"
config:
# Coordinates
base_folder: /path/to/model/files
# Options
api:
# Coordinates for your looker instance
base_url: https://YOUR_INSTANCE.cloud.looker.com
# Credentials for your Looker connection (https://docs.looker.com/reference/api-and-integration/api-auth)
client_id: client_id_from_looker
client_secret: client_secret_from_looker
# Alternative to API section above if you want a purely file-based ingestion with no api calls to Looker
# project_name: PROJECT_NAME # See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what is your project name
# connection_to_platform_map:
# connection_name_1:
# platform: snowflake # bigquery, hive, etc
# default_db: DEFAULT_DATABASE. # the default database configured for this connection
# default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
# connection_name_2:
# platform: bigquery # snowflake, hive, etc
# default_db: DEFAULT_DATABASE. # the default database configured for this connection
# default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
github_info:
repo: org/repo-name
sink:
# sink configs
```
## Config details
Note that a `.` is used to denote nested fields in the YAML recipe.
| Field | Required | Default | Description |
| ---------------------------------------------- | -------- | ---------- | ----------------------------------------------------------------------- |
| `base_folder` | ✅ | | Where the `*.model.lkml` and `*.view.lkml` files are stored. |
| `api.base_url` | ❓ if using api | | Url to your Looker instance: https://company.looker.com:19999 or https://looker.company.com, or similar. |
| `api.client_id` | ❓ if using api | | Looker API3 client ID. |
| `api.client_secret` | ❓ if using api | | Looker API3 client secret. |
| `project_name` | ❓ if NOT using api | | The project name within with all the model files live. See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what the Looker project name should be. The simplest way to see your projects is to click on `Develop` followed by `Manage LookML Projects` in the Looker application. |
| `connection_to_platform_map.<connection_name>` | | | Mappings between connection names in the model files to platform, database and schema values |
| `connection_to_platform_map.<connection_name>.platform` | ❓ if NOT using api | | Mappings between connection name in the model files to platform name (e.g. snowflake, bigquery, etc) |
| `connection_to_platform_map.<connection_name>.default_db` | ❓ if NOT using api | | Mappings between connection name in the model files to default database configured for this platform on Looker |
| `connection_to_platform_map.<connection_name>.default_schema` | ❓ if NOT using api | | Mappings between connection name in the model files to default schema configured for this platform on Looker |
| `platform_name` | | `"looker"` | Platform to use in namespace when constructing URNs. |
| `model_pattern.allow` | | | List of regex patterns for models to include in ingestion. |
| `model_pattern.deny` | | | List of regex patterns for models to exclude from ingestion. |
| `model_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `view_pattern.allow` | | | List of regex patterns for views to include in ingestion. |
| `view_pattern.deny` | | | List of regex patterns for views to exclude from ingestion. |
| `view_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
| `view_naming_pattern` | | `{project}.view.{name}` | Pattern for providing dataset names to views. Allowed variables are `{project}`, `{model}`, `{name}` |
| `view_browse_pattern` | | `/{env}/{platform}/{project}/views/{name}` | Pattern for providing browse paths to views. Allowed variables are `{project}`, `{model}`, `{name}`, `{platform}` and `{env}` |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
| `parse_table_names_from_sql` | | `False` | See note below. |
| `tag_measures_and_dimensions` | | `True` | When enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column. |
| `github_info` | | Empty. | When provided, will annotate views with github urls. See config variables below. |
| `github_info.repo` | ✅ if providing `github_info` | | Your github repository in `org/repo` form. e.g. `linkedin/datahub` |
| `github_info.branch` | | `main` | The default branch in your repo that you want urls to point to. Typically `main` or `master` |
| `github_info.base_url` | | `https://github.com` | The base url for your github coordinates |
| `sql_parser` | | `datahub.utilities.sql_parser.DefaultSQLParser` | See note below. |
Note! The integration can use an SQL parser to try to parse the tables the views depends on. This parsing is disabled by default,
but can be enabled by setting `parse_table_names_from_sql: True`. The default parser is based on the [`sqllineage`](https://pypi.org/project/sqllineage/) package.
As this package doesn't officially support all the SQL dialects that Looker supports, the result might not be correct. You can, however, implement a
custom parser and take it into use by setting the `sql_parser` configuration value. A custom SQL parser must inherit from `datahub.utilities.sql_parser.SQLParser`
and must be made available to Datahub by ,for example, installing it. The configuration then needs to be set to `module_name.ClassName` of the parser.
## Compatibility
Coming soon!
## Questions
If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!