Lineage Impact Analysis is a powerful workflow for understanding the complete set of upstream and downstream dependencies of a Dataset, Dashboard, Chart, and many other DataHub Entities.
This allows Data Practitioners to proactively identify the impact of breaking schema changes or failed data pipelines on downstream dependencies, rapidly discover which upstream dependencies may have caused unexpected data quality issues, and more.
Lineage Impact Analysis is available via the DataHub UI and GraphQL endpoints, supporting manual and automated workflows.
## Lineage Impact Analysis Setup, Prerequisites, and Permissions
Lineage Impact Analysis is enabled for any Entity that has associated Lineage relationships with other Entities and does not require any additional configuration.
Any DataHub user with “View Entity Page” permissions is able to view the full set of upstream or downstream Entities and export results to CSV from the DataHub UI.
## Using Lineage Impact Analysis
Follow these simple steps to understand the full dependency chain of your data entities.
1. On a given Entity Page, select the **Lineage** tab
3. Choose the **Degree of Dependencies** you are interested in. The default filter is “1 Degree of Dependency” to minimize processor-intensive queries.
6. View the filtered set of dependencies via CSV, with details about assigned ownership, domain, tags, terms, and quick links back to those entities within DataHub
Impact Analysis is a powerful feature that can place significant demands on the system. To maintain high performance when handling large result sets, we've implemented "Lightning Cache" - an alternate processing path that delivers results more quickly. By default, this cache activates with simple queries when there are more than 300 assets in the result set. You can customize this threshold by setting the environment variable `CACHE_SEARCH_LINEAGE_LIGHTNING_THRESHOLD` in your GMS pod.
However, the Lightning Cache has a limitation: it may include assets that are soft-deleted or no longer exist in the DataHub database. This occurs because lineage references may contain "ghost entities" (URNs without associated data).
Note that when you download Impact Analysis results, our system properly filters out these soft-deleted and non-existent assets. As a result, you might notice differences between what appears in the UI and what appears in your downloaded results.