docs(datahub-cli): include docs information for deploying recipes to remote executor using datahub ingest deploy... (#14625)

This commit is contained in:
Jonny Dixon 2025-09-02 13:55:13 +01:00 committed by GitHub
parent 742409fddc
commit c03a77db86
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -193,18 +193,77 @@ This command will automatically create a new recipe if it doesn't exist, or upda
Note that this is a complete update, and will remove any options that were previously set.
I.e: Not specifying a schedule in the cli update command will remove the schedule from the recipe to be updated.
**Basic example**
#### Command Options
To schedule a recipe called "Snowflake Integration", to run at 5am every day, London time with the recipe configured in a local `recipe.yaml` file:
```console
Usage: datahub ingest deploy [OPTIONS]
Options:
-n, --name TEXT Recipe Name
-c, --config FILE Config file in .toml or .yaml format. [required]
--urn TEXT Urn of recipe to update. If not specified here or in the recipe's pipeline_name,
this will create a new ingestion source.
--executor-id TEXT Executor id to route execution requests to. Do not use this unless you have
configured a custom executor.
--cli-version TEXT Provide a custom CLI version to use for ingestion. By default will use server
default.
--schedule TEXT Cron definition for schedule. If none is provided, ingestion recipe will not be
scheduled
--time-zone TEXT Timezone for the schedule in 'America/New_York' format. Uses UTC by default.
--debug BOOLEAN Should we debug.
--extra-pip TEXT Extra pip packages. e.g. ["memray"]
```
#### Examples
**Schedule a recipe with default executor:**
```shell
datahub ingest deploy --name "Snowflake Integration" --schedule "5 * * * *" --time-zone "Europe/London" -c recipe.yaml
datahub ingest deploy --name "Snowflake Integration" --schedule "0 5 * * *" --time-zone "Europe/London" -c recipe.yaml
```
**Deploy to a specific remote executor:**
```shell
datahub ingest deploy --name "Remote Snowflake Integration" --executor-id "remote-executor-pool-1" --schedule "0 5 * * *" -c recipe.yaml
```
**Update an existing recipe:**
```shell
datahub ingest deploy --urn "urn:li:dataHubIngestionSource:deploy-12345678" --schedule "0 6 * * *" -c updated_recipe.yaml
```
By default, the ingestion recipe's identifier is generated by hashing the name.
You can override the urn generation by passing the `--urn` flag to the CLI.
**Using `deployment` to avoid CLI args**
#### Remote Executors
The `--executor-id` option allows you to route ingestion execution to specific executors:
- **Default executor** (`"default"`): Uses the managed executor provided by DataHub Cloud or your configured default executor
- **Remote executors**: Route to custom remote executors you've deployed in your environment
:::note
Use executor IDs other than "default" only if you have configured custom remote executors.
:::
**Examples with remote executors:**
```shell
# Deploy to default executor (can be configured as remote)
datahub ingest deploy --name "My Integration" -c recipe.yaml
# Deploy to specific remote executor pool
datahub ingest deploy --name "Private Network Integration" --executor-id "private-network-pool" -c recipe.yaml
# Deploy to region-specific executor
datahub ingest deploy --name "EU Region Integration" --executor-id "eu-west-executor" -c recipe.yaml
```
For more information on setting up remote executors, see the [Remote Executor Setup Guide](managed-datahub/operator-guide/setting-up-remote-ingestion-executor.md).
#### Using deployment section
As an alternative to configuring settings from the CLI, all of these settings can also be set in the `deployment` field of the recipe.
@ -212,8 +271,10 @@ As an alternative to configuring settings from the CLI, all of these settings ca
# deployment_recipe.yml
deployment:
name: "Snowflake Integration"
schedule: "5 * * * *"
schedule: "0 5 * * *"
time_zone: "Europe/London"
executor_id: "remote-executor-pool-1" # Optional: specify remote executor
cli_version: "0.15.0.1" # Optional: specify CLI version
source: ...
```
@ -222,11 +283,30 @@ source: ...
datahub ingest deploy -c deployment_recipe.yml
```
This is particularly useful when you want all recipes to be stored in version control.
CLI options will override corresponding values in the deployment section.
#### Deployment Configuration Options
All deployment options that can be specified via CLI flags can also be configured in the `deployment` section:
| Field | CLI Option | Description | Default |
| ------------- | --------------- | ------------------------------- | ------------------ |
| `name` | `--name` | Recipe name displayed in the UI | Required |
| `schedule` | `--schedule` | Cron expression for scheduling | None (manual only) |
| `time_zone` | `--time-zone` | Timezone for scheduled runs | `"UTC"` |
| `executor_id` | `--executor-id` | Target executor for ingestion | `"default"` |
| `cli_version` | `--cli-version` | CLI version for ingestion | Server default |
#### Batch Deployment
Deploy multiple recipes from version control:
```shell
# Deploy every yml recipe in a directory
ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy -c {}
# Deploy with consistent executor across all recipes
ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy --executor-id "production-executor" -c {}
```
### init