LookML

For context on getting started with ingestion, check out our metadata ingestion guide.

Setup

To install this plugin, run pip install 'acryl-datahub[lookml]'.

Note! This plugin uses a package that requires Python 3.7+!

Capabilities

This plugin extracts the following:

LookML views from model files in a project
Name, upstream table names, metadata for dimensions, measures, and dimension groups attached as tags
If API integration is enabled (recommended), resolves table and view names by calling the Looker API, otherwise supports offline resolution of these names.

NOTE: To get complete Looker metadata integration (including Looker dashboards and charts and lineage to the underlying Looker views, you must ALSO use the Looker source. Documentation for that is here

Configuration Notes

See the Looker authentication docs for the steps to create a client ID and secret. You need to ensure that the API key is attached to a user that has Admin privileges. If that is not possible, read the configuration section to provide an offline specification of the connection_to_platform_map and the project_name.

Quickstart recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
  type: "lookml"
  config:
    # Coordinates
    base_folder: /path/to/model/files

    # Options
    api:
      # Coordinates for your looker instance
      base_url: https://YOUR_INSTANCE.cloud.looker.com

      # Credentials for your Looker connection (https://docs.looker.com/reference/api-and-integration/api-auth)
      client_id: client_id_from_looker 
      client_secret: client_secret_from_looker
      
    # Alternative to API section above if you want a purely file-based ingestion with no api calls to Looker
    # project_name: PROJECT_NAME # See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what is your project name
    # connection_to_platform_map:
    #   connection_name_1:
    #     platform: snowflake # bigquery, hive, etc
    #     default_db: DEFAULT_DATABASE. # the default database configured for this connection
    #     default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
    #   connection_name_2:
    #     platform: bigquery # snowflake, hive, etc
    #     default_db: DEFAULT_DATABASE. # the default database configured for this connection
    #     default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
    
    github_info:
       repo: org/repo-name
          
    
sink:
  # sink configs

Config details

Note that a . is used to denote nested fields in the YAML recipe.

Field	Required	Default	Description
`base_folder`	✅		Where the `.model.lkml` and `.view.lkml` files are stored.
`api.base_url`	❓ if using api		Url to your Looker instance: https://company.looker.com:19999 or https://looker.company.com, or similar.
`api.client_id`	❓ if using api		Looker API3 client ID.
`api.client_secret`	❓ if using api		Looker API3 client secret.
`project_name`	❓ if NOT using api		The project name within with all the model files live. See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what the Looker project name should be. The simplest way to see your projects is to click on `Develop` followed by `Manage LookML Projects` in the Looker application.
`connection_to_platform_map.<connection_name>`			Mappings between connection names in the model files to platform, database and schema values
`connection_to_platform_map.<connection_name>.platform`	❓ if NOT using api		Mappings between connection name in the model files to platform name (e.g. snowflake, bigquery, etc)
`connection_to_platform_map.<connection_name>.default_db`	❓ if NOT using api		Mappings between connection name in the model files to default database configured for this platform on Looker
`connection_to_platform_map.<connection_name>.default_schema`	❓ if NOT using api		Mappings between connection name in the model files to default schema configured for this platform on Looker
`platform_name`		`"looker"`	Platform to use in namespace when constructing URNs.
`model_pattern.allow`			List of regex patterns for models to include in ingestion.
`model_pattern.deny`			List of regex patterns for models to exclude from ingestion.
`model_pattern.ignoreCase`		`True`	Whether to ignore case sensitivity during pattern matching.
`view_pattern.allow`			List of regex patterns for views to include in ingestion.
`view_pattern.deny`			List of regex patterns for views to exclude from ingestion.
`view_pattern.ignoreCase`		`True`	Whether to ignore case sensitivity during pattern matching.
`view_naming_pattern`		`{project}.view.{name}`	Pattern for providing dataset names to views. Allowed variables are `{project}`, `{model}`, `{name}`
`view_browse_pattern`		`/{env}/{platform}/{project}/views/{name}`	Pattern for providing browse paths to views. Allowed variables are `{project}`, `{model}`, `{name}`, `{platform}` and `{env}`
`env`		`"PROD"`	Environment to use in namespace when constructing URNs.
`parse_table_names_from_sql`		`False`	See note below.
`tag_measures_and_dimensions`		`True`	When enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column.
`github_info`		Empty.	When provided, will annotate views with github urls. See config variables below.
`github_info.repo`	✅ if providing `github_info`		Your github repository in `org/repo` form. e.g. `linkedin/datahub`
`github_info.branch`		`main`	The default branch in your repo that you want urls to point to. Typically `main` or `master`
`github_info.base_url`		`https://github.com`	The base url for your github coordinates
`sql_parser`		`datahub.utilities.sql_parser.DefaultSQLParser`	See note below.

Note! The integration can use an SQL parser to try to parse the tables the views depends on. This parsing is disabled by default, but can be enabled by setting parse_table_names_from_sql: True. The default parser is based on the sqllineage package. As this package doesn't officially support all the SQL dialects that Looker supports, the result might not be correct. You can, however, implement a custom parser and take it into use by setting the sql_parser configuration value. A custom SQL parser must inherit from datahub.utilities.sql_parser.SQLParser and must be made available to Datahub by ,for example, installing it. The configuration then needs to be set to module_name.ClassName of the parser.

Compatibility

Coming soon!

Questions

If you've got any questions on configuring this source, feel free to ping us on our Slack!

8.7 KiB Raw Blame History