feat(ingest): unbundle airflow plugin emitter dependencies (#7493)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
This commit is contained in:
cburroughs 2023-03-07 12:07:42 -05:00 committed by GitHub
parent de719663ff
commit cc0772f8d8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 12 additions and 1 deletions

View File

@ -6,6 +6,9 @@ This file documents any backwards-incompatible changes in DataHub and assists pe
### Breaking Changes ### Breaking Changes
- #7016 Add `add_database_name_to_urn` flag to Oracle source which ensure that Dataset urns have the DB name as a prefix to prevent collision (.e.g. {database}.{schema}.{table}). ONLY breaking if you set this flag to true, otherwise behavior remains the same. - #7016 Add `add_database_name_to_urn` flag to Oracle source which ensure that Dataset urns have the DB name as a prefix to prevent collision (.e.g. {database}.{schema}.{table}). ONLY breaking if you set this flag to true, otherwise behavior remains the same.
- The Airflow plugin no longer includes the DataHub Kafka emitter by default. Use `pip install acryl-datahub-airflow-plugin[datahub-kafka]` for Kafka support.
- The Airflow lineage backend no longer includes the DataHub Kafka emitter by default. Use `pip install acryl-datahub[airflow,datahub-kafka]` for Kafka support.
### Potential Downtime ### Potential Downtime

View File

@ -26,6 +26,12 @@ If you're using Airflow 1.x, use the Airflow lineage plugin with acryl-datahub-a
pip install acryl-datahub-airflow-plugin pip install acryl-datahub-airflow-plugin
``` ```
:::note
The [DataHub Rest](../../metadata-ingestion/sink_docs/datahub.md#datahub-rest) emitter is included in the plugin package by default. To use [DataHub Kafka](../../metadata-ingestion/sink_docs/datahub.md#datahub-kafka) install `pip install acryl-datahub-airflow-plugin[datahub-kafka]`.
:::
2. Disable lazy plugin loading in your airflow.cfg. 2. Disable lazy plugin loading in your airflow.cfg.
On MWAA you should add this config to your [Apache Airflow configuration options](https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-env-variables.html#configuring-2.0-airflow-override). On MWAA you should add this config to your [Apache Airflow configuration options](https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-env-variables.html#configuring-2.0-airflow-override).
@ -89,6 +95,8 @@ If you are looking to run Airflow and DataHub using docker locally, follow the g
```shell ```shell
pip install acryl-datahub[airflow] pip install acryl-datahub[airflow]
# If you need the Kafka-based emitter/hook:
pip install acryl-datahub[airflow,datahub-kafka]
``` ```
2. You must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one. 2. You must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one.

View File

@ -125,5 +125,6 @@ setuptools.setup(
install_requires=list(base_requirements), install_requires=list(base_requirements),
extras_require={ extras_require={
"dev": list(dev_requirements), "dev": list(dev_requirements),
"datahub-kafka": f"acryl-datahub[datahub-kafka] == {package_metadata['__version__']}",
}, },
) )

View File

@ -251,7 +251,6 @@ plugins: Dict[str, Set[str]] = {
"airflow": { "airflow": {
"apache-airflow >= 2.0.2", "apache-airflow >= 2.0.2",
*rest_common, *rest_common,
*kafka_common,
}, },
"circuit-breaker": { "circuit-breaker": {
"gql>=3.3.0", "gql>=3.3.0",