- **Uses SQL from Looker API**: The system queries the Looker API to generate fully resolved SQL statements for views, which are then parsed to extract column-level and table-level lineage. This provides more accurate lineage than regex-based parsing.
- **Works Only for Reachable Views**: The Looker Query API requires an explore name to generate SQL queries. Therefore, this method only works for views that are **reachable** from explores defined in your LookML model files. A view is considered "reachable" if it is referenced by at least one explore (either directly or through joins).
- **Fallback Behavior**: Views that are not reachable from any explore cannot use the API-based approach and will automatically fall back to regex-based parsing. If `emit_reachable_views_only: true` (default), unreachable views are skipped entirely.
- If the constant's value is not resolved or incorrectly resolved, you can specify `lookml_constants` configuration in ingestion recipe as shown below. The constant value in recipe takes precedence over constant values resolved from manifest.
Although liquid variables and LookML constants can be used anywhere in LookML code, their values are currently resolved only for LookML views by DataHub LookML ingestion. This behavior is sufficient since LookML ingestion processes only views and their upstream dependencies.
Looker projects support organization as multiple git repos, with [remote includes that can refer to projects that are stored in a different repo](https://cloud.google.com/looker/docs/importing-projects#include_files_from_an_imported_project). If your Looker implementation uses multi-project setup, you can configure the LookML source to pull in metadata from your remote projects as well.
If you are using local or remote dependencies, you will see include directives in your lookml files that look like this:
To ingest Looker repositories that are including files defined in other projects, you will need to use the `project_dependencies` directive within the configuration section.
- Your primary project refers to a remote project called `my_remote_project`
- The remote project is homed in the GitHub repo `my_org/my_remote_project`
- You have provisioned a GitHub deploy key and stored the credential in the environment variable (or UI secret), `${MY_REMOTE_PROJECT_DEPLOY_KEY}`
In this case, you can add this section to your recipe to activate multi-project LookML ingestion.
```
source:
type: lookml
config:
... other config variables
project_dependencies:
my_remote_project:
repo: my_org/my_remote_project
deploy_key: ${MY_REMOTE_PROJECT_DEPLOY_KEY}
```
Under the hood, DataHub will check out your remote repository using the provisioned deploy key, and use it to navigate includes that you have in the model files from your primary project.
If you have the remote project checked out locally, and do not need DataHub to clone the project for you, you can provide DataHub directly with the path to the project like the config snippet below:
This is not the same as ingesting the remote project as a primary Looker project because DataHub will not be processing the model files that might live in the remote project. If you want to additionally include the views accessible via the models in the remote project, create a second recipe where your remote project is the primary project.
For Looker views with a large number of fields (100+), DataHub automatically uses field splitting to ensure reliable lineage extraction. This feature splits large field sets into manageable chunks, processes them in parallel, and combines the results.
:::important
**API Configuration Required:** Field splitting requires Looker API credentials to be configured. You must:
1. Provide the `api` configuration section with your Looker credentials
2. Set `use_api_for_view_lineage: true` to enable API-based lineage extraction
Without API configuration, field splitting will not be available and the system will fall back to regex-based parsing, which may fail for large views.
**Reachable Views Only:** The `LookerQueryAPIBasedViewUpstream` implementation (used for field splitting) works by querying the Looker API to generate SQL statements for views. This approach only works for **reachable views** - views that are referenced by explores defined in your LookML model files. Views that are not reachable from any explore cannot be queried via the Looker API and will fall back to regex-based parsing. The `emit_reachable_views_only` configuration option controls whether only reachable views are processed.
:::
#### When Field Splitting is Used
Field splitting is automatically triggered when:
-`use_api_for_view_lineage: true` is set
- Looker API credentials are provided
- A view has more fields than the configured threshold (default: 100 fields)
You can adjust this threshold based on your needs:
```yml
source:
type: lookml
config:
# Adjust the threshold for field splitting (default: 100)
field_threshold_for_splitting: 100
```
**When to adjust the threshold:**
- **Lower the threshold** (e.g., 50) if you experience SQL parsing failures with views that have 50-100 fields
- **Raise the threshold** (e.g., 150) if your views consistently have 100+ fields and you want to minimize API calls
#### Partial Lineage Results
By default, DataHub will return partial lineage results even if some field chunks fail to parse. This ensures you get lineage information for working fields rather than complete failure.
```yml
source:
type: lookml
config:
# Allow partial lineage when some chunks fail (default: true)
allow_partial_lineage_results: true
```
**When to disable:**
- Set to `false` if you want strict validation and prefer complete failure over partial results
- Useful for debugging to identify problematic views that need attention
#### Individual Field Fallback
When a chunk of fields fails, DataHub can automatically attempt to process each field individually. This helps:
- Maximize lineage extraction by processing working fields
- Identify specific problematic fields that cause issues
- Provide detailed reporting on which fields fail
```yml
source:
type: lookml
config:
# Enable individual field processing when chunks fail (default: true)
enable_individual_field_fallback: true
```
**When to disable:**
- Set to `false` if you want faster processing and don't need to identify problematic fields
- Useful if you know all fields in a view are valid and want to skip the fallback overhead
#### Parallel Processing Performance
Field chunks are processed in parallel to improve performance. You can control the number of worker threads:
```yml
source:
type: lookml
config:
# Number of parallel workers (default: 10, max: 100)
max_workers_for_parallel_processing: 10
```
**Performance tuning:**
- **Increase workers** (e.g., 20-30) for faster processing if you have many large views and sufficient system resources
- **Decrease workers** (e.g., 5) if you're hitting API rate limits or have limited system resources
- **Set to 1** to process sequentially (useful for debugging)
**Important:** The maximum allowed value is 100 to prevent resource exhaustion. Values above 100 will be automatically capped with a warning.
#### Complete Configuration Example
Here's a complete example configuration for handling large views:
```yml
source:
type: lookml
config:
base_folder: /path/to/lookml
# API configuration (REQUIRED for field splitting)
- The `api` section with credentials is **required** for field splitting to work
-`use_api_for_view_lineage: true` must be set to enable API-based lineage extraction
- Without API configuration, field splitting features are not available
- **Reachable Views Only**: Field splitting via `LookerQueryAPIBasedViewUpstream` only works for views that are reachable from explores. The Looker Query API requires an explore name to generate SQL, so views not referenced by any explore will use regex-based parsing instead
- The `emit_reachable_views_only` configuration (default: `true`) controls whether unreachable views are processed at all
**Check ingestion logs for:**
- Field splitting statistics: `View 'view_name' has X fields, exceeding threshold of Y. Splitting into multiple queries`
- Success rates: `Combined results for view 'view_name': X tables, Y column lineages, success rate: Z%`
- Problematic fields: Warnings about specific fields that fail processing
**Common issues:**
- **Field splitting not working**: Verify `use_api_for_view_lineage: true` and API credentials are configured
If you see messages like `my_file.view.lkml': "failed to load view file: Unable to find a matching expression for '<literal>' on line 5"` in the failure logs, it indicates a parsing error for the LookML file.
The first thing to check is that the Looker IDE can validate the file without issues. You can check this by clicking this "Validate LookML" button in the IDE when in development mode.
If that's not the issue, it might be because DataHub's parser, which is based on the [joshtemple/lkml](https://github.com/joshtemple/lkml) library, is slightly more strict than the official Looker parser.
Note that there's currently only one known discrepancy between the two parsers, and it's related to using [leading colons in blocks](https://github.com/joshtemple/lkml/issues/90).
To check if DataHub can parse your LookML file syntax, you can use the `lkml` CLI tool. If this raises an exception, DataHub will fail to parse the file.