docs(datahub-cli): include docs information for deploying recipes to remote executor using datahub ingest deploy... (#14625)

This commit is contained in:
Jonny Dixon 2025-09-02 13:55:13 +01:00 committed by GitHub
parent 742409fddc
commit c03a77db86
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -193,18 +193,77 @@ This command will automatically create a new recipe if it doesn't exist, or upda
Note that this is a complete update, and will remove any options that were previously set. Note that this is a complete update, and will remove any options that were previously set.
I.e: Not specifying a schedule in the cli update command will remove the schedule from the recipe to be updated. I.e: Not specifying a schedule in the cli update command will remove the schedule from the recipe to be updated.
**Basic example** #### Command Options
To schedule a recipe called "Snowflake Integration", to run at 5am every day, London time with the recipe configured in a local `recipe.yaml` file: ```console
Usage: datahub ingest deploy [OPTIONS]
Options:
-n, --name TEXT Recipe Name
-c, --config FILE Config file in .toml or .yaml format. [required]
--urn TEXT Urn of recipe to update. If not specified here or in the recipe's pipeline_name,
this will create a new ingestion source.
--executor-id TEXT Executor id to route execution requests to. Do not use this unless you have
configured a custom executor.
--cli-version TEXT Provide a custom CLI version to use for ingestion. By default will use server
default.
--schedule TEXT Cron definition for schedule. If none is provided, ingestion recipe will not be
scheduled
--time-zone TEXT Timezone for the schedule in 'America/New_York' format. Uses UTC by default.
--debug BOOLEAN Should we debug.
--extra-pip TEXT Extra pip packages. e.g. ["memray"]
```
#### Examples
**Schedule a recipe with default executor:**
```shell ```shell
datahub ingest deploy --name "Snowflake Integration" --schedule "5 * * * *" --time-zone "Europe/London" -c recipe.yaml datahub ingest deploy --name "Snowflake Integration" --schedule "0 5 * * *" --time-zone "Europe/London" -c recipe.yaml
```
**Deploy to a specific remote executor:**
```shell
datahub ingest deploy --name "Remote Snowflake Integration" --executor-id "remote-executor-pool-1" --schedule "0 5 * * *" -c recipe.yaml
```
**Update an existing recipe:**
```shell
datahub ingest deploy --urn "urn:li:dataHubIngestionSource:deploy-12345678" --schedule "0 6 * * *" -c updated_recipe.yaml
``` ```
By default, the ingestion recipe's identifier is generated by hashing the name. By default, the ingestion recipe's identifier is generated by hashing the name.
You can override the urn generation by passing the `--urn` flag to the CLI. You can override the urn generation by passing the `--urn` flag to the CLI.
**Using `deployment` to avoid CLI args** #### Remote Executors
The `--executor-id` option allows you to route ingestion execution to specific executors:
- **Default executor** (`"default"`): Uses the managed executor provided by DataHub Cloud or your configured default executor
- **Remote executors**: Route to custom remote executors you've deployed in your environment
:::note
Use executor IDs other than "default" only if you have configured custom remote executors.
:::
**Examples with remote executors:**
```shell
# Deploy to default executor (can be configured as remote)
datahub ingest deploy --name "My Integration" -c recipe.yaml
# Deploy to specific remote executor pool
datahub ingest deploy --name "Private Network Integration" --executor-id "private-network-pool" -c recipe.yaml
# Deploy to region-specific executor
datahub ingest deploy --name "EU Region Integration" --executor-id "eu-west-executor" -c recipe.yaml
```
For more information on setting up remote executors, see the [Remote Executor Setup Guide](managed-datahub/operator-guide/setting-up-remote-ingestion-executor.md).
#### Using deployment section
As an alternative to configuring settings from the CLI, all of these settings can also be set in the `deployment` field of the recipe. As an alternative to configuring settings from the CLI, all of these settings can also be set in the `deployment` field of the recipe.
@ -212,8 +271,10 @@ As an alternative to configuring settings from the CLI, all of these settings ca
# deployment_recipe.yml # deployment_recipe.yml
deployment: deployment:
name: "Snowflake Integration" name: "Snowflake Integration"
schedule: "5 * * * *" schedule: "0 5 * * *"
time_zone: "Europe/London" time_zone: "Europe/London"
executor_id: "remote-executor-pool-1" # Optional: specify remote executor
cli_version: "0.15.0.1" # Optional: specify CLI version
source: ... source: ...
``` ```
@ -222,11 +283,30 @@ source: ...
datahub ingest deploy -c deployment_recipe.yml datahub ingest deploy -c deployment_recipe.yml
``` ```
This is particularly useful when you want all recipes to be stored in version control. CLI options will override corresponding values in the deployment section.
#### Deployment Configuration Options
All deployment options that can be specified via CLI flags can also be configured in the `deployment` section:
| Field | CLI Option | Description | Default |
| ------------- | --------------- | ------------------------------- | ------------------ |
| `name` | `--name` | Recipe name displayed in the UI | Required |
| `schedule` | `--schedule` | Cron expression for scheduling | None (manual only) |
| `time_zone` | `--time-zone` | Timezone for scheduled runs | `"UTC"` |
| `executor_id` | `--executor-id` | Target executor for ingestion | `"default"` |
| `cli_version` | `--cli-version` | CLI version for ingestion | Server default |
#### Batch Deployment
Deploy multiple recipes from version control:
```shell ```shell
# Deploy every yml recipe in a directory # Deploy every yml recipe in a directory
ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy -c {} ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy -c {}
# Deploy with consistent executor across all recipes
ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy --executor-id "production-executor" -c {}
``` ```
### init ### init