From 1bea570b8db00d3f8aa78c26b86a4522bea37fbd Mon Sep 17 00:00:00 2001 From: Harshal Sheth Date: Fri, 20 Sep 2024 13:22:15 -0700 Subject: [PATCH] docs(ingest): add docs on pydantic compatibility (#11423) --- docs/cli.md | 5 ++--- metadata-ingestion/developing.md | 16 ++++++++++++++-- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/docs/cli.md b/docs/cli.md index 1f1e6dfa26..c109d02e0a 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -34,9 +34,9 @@ datahub init # authenticate your datahub CLI with your datahub instance ``` -If you run into an error, try checking the [_common setup issues_](../metadata-ingestion/developing.md#Common-setup-issues). +If you run into an error, try checking the [_common setup issues_](../metadata-ingestion/developing.md#common-setup-issues). -Other installation options such as installation from source and running the cli inside a container are available further below in the guide [here](#alternate-installation-options) +Other installation options such as installation from source and running the cli inside a container are available further below in the guide [here](#alternate-installation-options). ## Starter Commands @@ -672,7 +672,6 @@ Old Entities Migrated = {'urn:li:dataset:(urn:li:dataPlatform:hive,logging_event ### Using docker [![Docker Hub](https://img.shields.io/docker/pulls/acryldata/datahub-ingestion?style=plastic)](https://hub.docker.com/r/acryldata/datahub-ingestion) -[![datahub-ingestion docker](https://github.com/acryldata/datahub/workflows/datahub-ingestion%20docker/badge.svg)](https://github.com/acryldata/datahub/actions/workflows/docker-ingestion.yml) If you don't want to install locally, you can alternatively run metadata ingestion within a Docker container. We have prebuilt images available on [Docker hub](https://hub.docker.com/r/acryldata/datahub-ingestion). All plugins will be installed and enabled automatically. diff --git a/metadata-ingestion/developing.md b/metadata-ingestion/developing.md index 9293fc7a36..19a18c5275 100644 --- a/metadata-ingestion/developing.md +++ b/metadata-ingestion/developing.md @@ -55,7 +55,6 @@ logger.debug("this is the sample debug line") #3. click on the `log` option ``` - > **P.S. if you are not able to see the log lines, then restart the `airflow scheduler` and rerun the DAG** ### (Optional) Set up your Python environment for developing on Dagster Plugin @@ -70,6 +69,7 @@ datahub version # should print "DataHub CLI version: unavailable (installed in ``` ### (Optional) Set up your Python environment for developing on Prefect Plugin + From the repository root: ```shell @@ -127,6 +127,18 @@ This sometimes happens if there's a version mismatch between the Kafka's C libra +
+ Conflict: acryl-datahub requires pydantic 1.10 + +The base `acryl-datahub` package supports both Pydantic 1.x and 2.x. However, some of our specific sources require Pydantic 1.x because of transitive dependencies. + +If you're primarily using `acryl-datahub` for the SDKs, you can install `acryl-datahub` and some extras, like `acryl-datahub[sql-parser]`, without getting conflicts related to Pydantic versioning. + +We recommend not installing full ingestion sources into your main environment (e.g. avoid having a dependency on `acryl-datahub[snowflake]` or other ingestion sources). +Instead, we recommend using UI-based ingestion or isolating the ingestion pipelines using [virtual environments](https://docs.python.org/3/library/venv.html). If you're using an orchestrator, they often have first-class support for virtual environments - here's an [example for Airflow](./schedule_docs/airflow.md). + +
+ ### Using Plugins in Development The syntax for installing plugins is slightly different in development. For example: @@ -286,4 +298,4 @@ tox -- --update-golden-files # Update golden files for a specific environment. tox -e py310-airflow26 -- --update-golden-files -``` \ No newline at end of file +```