mirror of
https://github.com/datahub-project/datahub.git
synced 2025-09-09 09:11:01 +00:00
docs(datahub-cli): include docs information for deploying recipes to remote executor using datahub ingest deploy...
(#14625)
This commit is contained in:
parent
742409fddc
commit
c03a77db86
92
docs/cli.md
92
docs/cli.md
@ -193,18 +193,77 @@ This command will automatically create a new recipe if it doesn't exist, or upda
|
||||
Note that this is a complete update, and will remove any options that were previously set.
|
||||
I.e: Not specifying a schedule in the cli update command will remove the schedule from the recipe to be updated.
|
||||
|
||||
**Basic example**
|
||||
#### Command Options
|
||||
|
||||
To schedule a recipe called "Snowflake Integration", to run at 5am every day, London time with the recipe configured in a local `recipe.yaml` file:
|
||||
```console
|
||||
Usage: datahub ingest deploy [OPTIONS]
|
||||
|
||||
Options:
|
||||
-n, --name TEXT Recipe Name
|
||||
-c, --config FILE Config file in .toml or .yaml format. [required]
|
||||
--urn TEXT Urn of recipe to update. If not specified here or in the recipe's pipeline_name,
|
||||
this will create a new ingestion source.
|
||||
--executor-id TEXT Executor id to route execution requests to. Do not use this unless you have
|
||||
configured a custom executor.
|
||||
--cli-version TEXT Provide a custom CLI version to use for ingestion. By default will use server
|
||||
default.
|
||||
--schedule TEXT Cron definition for schedule. If none is provided, ingestion recipe will not be
|
||||
scheduled
|
||||
--time-zone TEXT Timezone for the schedule in 'America/New_York' format. Uses UTC by default.
|
||||
--debug BOOLEAN Should we debug.
|
||||
--extra-pip TEXT Extra pip packages. e.g. ["memray"]
|
||||
```
|
||||
|
||||
#### Examples
|
||||
|
||||
**Schedule a recipe with default executor:**
|
||||
|
||||
```shell
|
||||
datahub ingest deploy --name "Snowflake Integration" --schedule "5 * * * *" --time-zone "Europe/London" -c recipe.yaml
|
||||
datahub ingest deploy --name "Snowflake Integration" --schedule "0 5 * * *" --time-zone "Europe/London" -c recipe.yaml
|
||||
```
|
||||
|
||||
**Deploy to a specific remote executor:**
|
||||
|
||||
```shell
|
||||
datahub ingest deploy --name "Remote Snowflake Integration" --executor-id "remote-executor-pool-1" --schedule "0 5 * * *" -c recipe.yaml
|
||||
```
|
||||
|
||||
**Update an existing recipe:**
|
||||
|
||||
```shell
|
||||
datahub ingest deploy --urn "urn:li:dataHubIngestionSource:deploy-12345678" --schedule "0 6 * * *" -c updated_recipe.yaml
|
||||
```
|
||||
|
||||
By default, the ingestion recipe's identifier is generated by hashing the name.
|
||||
You can override the urn generation by passing the `--urn` flag to the CLI.
|
||||
|
||||
**Using `deployment` to avoid CLI args**
|
||||
#### Remote Executors
|
||||
|
||||
The `--executor-id` option allows you to route ingestion execution to specific executors:
|
||||
|
||||
- **Default executor** (`"default"`): Uses the managed executor provided by DataHub Cloud or your configured default executor
|
||||
- **Remote executors**: Route to custom remote executors you've deployed in your environment
|
||||
|
||||
:::note
|
||||
Use executor IDs other than "default" only if you have configured custom remote executors.
|
||||
:::
|
||||
|
||||
**Examples with remote executors:**
|
||||
|
||||
```shell
|
||||
# Deploy to default executor (can be configured as remote)
|
||||
datahub ingest deploy --name "My Integration" -c recipe.yaml
|
||||
|
||||
# Deploy to specific remote executor pool
|
||||
datahub ingest deploy --name "Private Network Integration" --executor-id "private-network-pool" -c recipe.yaml
|
||||
|
||||
# Deploy to region-specific executor
|
||||
datahub ingest deploy --name "EU Region Integration" --executor-id "eu-west-executor" -c recipe.yaml
|
||||
```
|
||||
|
||||
For more information on setting up remote executors, see the [Remote Executor Setup Guide](managed-datahub/operator-guide/setting-up-remote-ingestion-executor.md).
|
||||
|
||||
#### Using deployment section
|
||||
|
||||
As an alternative to configuring settings from the CLI, all of these settings can also be set in the `deployment` field of the recipe.
|
||||
|
||||
@ -212,8 +271,10 @@ As an alternative to configuring settings from the CLI, all of these settings ca
|
||||
# deployment_recipe.yml
|
||||
deployment:
|
||||
name: "Snowflake Integration"
|
||||
schedule: "5 * * * *"
|
||||
schedule: "0 5 * * *"
|
||||
time_zone: "Europe/London"
|
||||
executor_id: "remote-executor-pool-1" # Optional: specify remote executor
|
||||
cli_version: "0.15.0.1" # Optional: specify CLI version
|
||||
|
||||
source: ...
|
||||
```
|
||||
@ -222,11 +283,30 @@ source: ...
|
||||
datahub ingest deploy -c deployment_recipe.yml
|
||||
```
|
||||
|
||||
This is particularly useful when you want all recipes to be stored in version control.
|
||||
CLI options will override corresponding values in the deployment section.
|
||||
|
||||
#### Deployment Configuration Options
|
||||
|
||||
All deployment options that can be specified via CLI flags can also be configured in the `deployment` section:
|
||||
|
||||
| Field | CLI Option | Description | Default |
|
||||
| ------------- | --------------- | ------------------------------- | ------------------ |
|
||||
| `name` | `--name` | Recipe name displayed in the UI | Required |
|
||||
| `schedule` | `--schedule` | Cron expression for scheduling | None (manual only) |
|
||||
| `time_zone` | `--time-zone` | Timezone for scheduled runs | `"UTC"` |
|
||||
| `executor_id` | `--executor-id` | Target executor for ingestion | `"default"` |
|
||||
| `cli_version` | `--cli-version` | CLI version for ingestion | Server default |
|
||||
|
||||
#### Batch Deployment
|
||||
|
||||
Deploy multiple recipes from version control:
|
||||
|
||||
```shell
|
||||
# Deploy every yml recipe in a directory
|
||||
ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy -c {}
|
||||
|
||||
# Deploy with consistent executor across all recipes
|
||||
ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy --executor-id "production-executor" -c {}
|
||||
```
|
||||
|
||||
### init
|
||||
|
Loading…
x
Reference in New Issue
Block a user