8.7 KiB
LookML
For context on getting started with ingestion, check out our metadata ingestion guide.
Setup
To install this plugin, run pip install 'acryl-datahub[lookml]'
.
Note! This plugin uses a package that requires Python 3.7+!
Capabilities
This plugin extracts the following:
- LookML views from model files in a project
- Name, upstream table names, metadata for dimensions, measures, and dimension groups attached as tags
- If API integration is enabled (recommended), resolves table and view names by calling the Looker API, otherwise supports offline resolution of these names.
NOTE: To get complete Looker metadata integration (including Looker dashboards and charts and lineage to the underlying Looker views, you must ALSO use the Looker source. Documentation for that is here
Configuration Notes
See the Looker authentication docs for the steps to create a client ID and secret.
You need to ensure that the API key is attached to a user that has Admin privileges. If that is not possible, read the configuration section to provide an offline specification of the connection_to_platform_map
and the project_name
.
Quickstart recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: "lookml"
config:
# Coordinates
base_folder: /path/to/model/files
# Options
api:
# Coordinates for your looker instance
base_url: https://YOUR_INSTANCE.cloud.looker.com
# Credentials for your Looker connection (https://docs.looker.com/reference/api-and-integration/api-auth)
client_id: client_id_from_looker
client_secret: client_secret_from_looker
# Alternative to API section above if you want a purely file-based ingestion with no api calls to Looker
# project_name: PROJECT_NAME # See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what is your project name
# connection_to_platform_map:
# connection_name_1:
# platform: snowflake # bigquery, hive, etc
# default_db: DEFAULT_DATABASE. # the default database configured for this connection
# default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
# connection_name_2:
# platform: bigquery # snowflake, hive, etc
# default_db: DEFAULT_DATABASE. # the default database configured for this connection
# default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
github_info:
repo: org/repo-name
sink:
# sink configs
Config details
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Required | Default | Description |
---|---|---|---|
base_folder |
✅ | Where the *.model.lkml and *.view.lkml files are stored. |
|
api.base_url |
❓ if using api | Url to your Looker instance: https://company.looker.com:19999 or https://looker.company.com, or similar. | |
api.client_id |
❓ if using api | Looker API3 client ID. | |
api.client_secret |
❓ if using api | Looker API3 client secret. | |
project_name |
❓ if NOT using api | The project name within with all the model files live. See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what the Looker project name should be. The simplest way to see your projects is to click on Develop followed by Manage LookML Projects in the Looker application. |
|
connection_to_platform_map.<connection_name> |
Mappings between connection names in the model files to platform, database and schema values | ||
connection_to_platform_map.<connection_name>.platform |
❓ if NOT using api | Mappings between connection name in the model files to platform name (e.g. snowflake, bigquery, etc) | |
connection_to_platform_map.<connection_name>.default_db |
❓ if NOT using api | Mappings between connection name in the model files to default database configured for this platform on Looker | |
connection_to_platform_map.<connection_name>.default_schema |
❓ if NOT using api | Mappings between connection name in the model files to default schema configured for this platform on Looker | |
platform_name |
"looker" |
Platform to use in namespace when constructing URNs. | |
model_pattern.allow |
List of regex patterns for models to include in ingestion. | ||
model_pattern.deny |
List of regex patterns for models to exclude from ingestion. | ||
model_pattern.ignoreCase |
True |
Whether to ignore case sensitivity during pattern matching. | |
view_pattern.allow |
List of regex patterns for views to include in ingestion. | ||
view_pattern.deny |
List of regex patterns for views to exclude from ingestion. | ||
view_pattern.ignoreCase |
True |
Whether to ignore case sensitivity during pattern matching. | |
view_naming_pattern |
{project}.view.{name} |
Pattern for providing dataset names to views. Allowed variables are {project} , {model} , {name} |
|
view_browse_pattern |
/{env}/{platform}/{project}/views/{name} |
Pattern for providing browse paths to views. Allowed variables are {project} , {model} , {name} , {platform} and {env} |
|
env |
"PROD" |
Environment to use in namespace when constructing URNs. | |
parse_table_names_from_sql |
False |
See note below. | |
tag_measures_and_dimensions |
True |
When enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column. | |
github_info |
Empty. | When provided, will annotate views with github urls. See config variables below. | |
github_info.repo |
✅ if providing github_info |
Your github repository in org/repo form. e.g. linkedin/datahub |
|
github_info.branch |
main |
The default branch in your repo that you want urls to point to. Typically main or master |
|
github_info.base_url |
https://github.com |
The base url for your github coordinates | |
sql_parser |
datahub.utilities.sql_parser.DefaultSQLParser |
See note below. |
Note! The integration can use an SQL parser to try to parse the tables the views depends on. This parsing is disabled by default,
but can be enabled by setting parse_table_names_from_sql: True
. The default parser is based on the sqllineage
package.
As this package doesn't officially support all the SQL dialects that Looker supports, the result might not be correct. You can, however, implement a
custom parser and take it into use by setting the sql_parser
configuration value. A custom SQL parser must inherit from datahub.utilities.sql_parser.SQLParser
and must be made available to Datahub by ,for example, installing it. The configuration then needs to be set to module_name.ClassName
of the parser.
Compatibility
Coming soon!
Questions
If you've got any questions on configuring this source, feel free to ping us on our Slack!