diff --git a/docs/cli.md b/docs/cli.md index 6666cf918e..513d13e909 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -193,18 +193,77 @@ This command will automatically create a new recipe if it doesn't exist, or upda Note that this is a complete update, and will remove any options that were previously set. I.e: Not specifying a schedule in the cli update command will remove the schedule from the recipe to be updated. -**Basic example** +#### Command Options -To schedule a recipe called "Snowflake Integration", to run at 5am every day, London time with the recipe configured in a local `recipe.yaml` file: +```console +Usage: datahub ingest deploy [OPTIONS] + +Options: + -n, --name TEXT Recipe Name + -c, --config FILE Config file in .toml or .yaml format. [required] + --urn TEXT Urn of recipe to update. If not specified here or in the recipe's pipeline_name, + this will create a new ingestion source. + --executor-id TEXT Executor id to route execution requests to. Do not use this unless you have + configured a custom executor. + --cli-version TEXT Provide a custom CLI version to use for ingestion. By default will use server + default. + --schedule TEXT Cron definition for schedule. If none is provided, ingestion recipe will not be + scheduled + --time-zone TEXT Timezone for the schedule in 'America/New_York' format. Uses UTC by default. + --debug BOOLEAN Should we debug. + --extra-pip TEXT Extra pip packages. e.g. ["memray"] +``` + +#### Examples + +**Schedule a recipe with default executor:** ```shell -datahub ingest deploy --name "Snowflake Integration" --schedule "5 * * * *" --time-zone "Europe/London" -c recipe.yaml +datahub ingest deploy --name "Snowflake Integration" --schedule "0 5 * * *" --time-zone "Europe/London" -c recipe.yaml +``` + +**Deploy to a specific remote executor:** + +```shell +datahub ingest deploy --name "Remote Snowflake Integration" --executor-id "remote-executor-pool-1" --schedule "0 5 * * *" -c recipe.yaml +``` + +**Update an existing recipe:** + +```shell +datahub ingest deploy --urn "urn:li:dataHubIngestionSource:deploy-12345678" --schedule "0 6 * * *" -c updated_recipe.yaml ``` By default, the ingestion recipe's identifier is generated by hashing the name. You can override the urn generation by passing the `--urn` flag to the CLI. -**Using `deployment` to avoid CLI args** +#### Remote Executors + +The `--executor-id` option allows you to route ingestion execution to specific executors: + +- **Default executor** (`"default"`): Uses the managed executor provided by DataHub Cloud or your configured default executor +- **Remote executors**: Route to custom remote executors you've deployed in your environment + +:::note +Use executor IDs other than "default" only if you have configured custom remote executors. +::: + +**Examples with remote executors:** + +```shell +# Deploy to default executor (can be configured as remote) +datahub ingest deploy --name "My Integration" -c recipe.yaml + +# Deploy to specific remote executor pool +datahub ingest deploy --name "Private Network Integration" --executor-id "private-network-pool" -c recipe.yaml + +# Deploy to region-specific executor +datahub ingest deploy --name "EU Region Integration" --executor-id "eu-west-executor" -c recipe.yaml +``` + +For more information on setting up remote executors, see the [Remote Executor Setup Guide](managed-datahub/operator-guide/setting-up-remote-ingestion-executor.md). + +#### Using deployment section As an alternative to configuring settings from the CLI, all of these settings can also be set in the `deployment` field of the recipe. @@ -212,8 +271,10 @@ As an alternative to configuring settings from the CLI, all of these settings ca # deployment_recipe.yml deployment: name: "Snowflake Integration" - schedule: "5 * * * *" + schedule: "0 5 * * *" time_zone: "Europe/London" + executor_id: "remote-executor-pool-1" # Optional: specify remote executor + cli_version: "0.15.0.1" # Optional: specify CLI version source: ... ``` @@ -222,11 +283,30 @@ source: ... datahub ingest deploy -c deployment_recipe.yml ``` -This is particularly useful when you want all recipes to be stored in version control. +CLI options will override corresponding values in the deployment section. + +#### Deployment Configuration Options + +All deployment options that can be specified via CLI flags can also be configured in the `deployment` section: + +| Field | CLI Option | Description | Default | +| ------------- | --------------- | ------------------------------- | ------------------ | +| `name` | `--name` | Recipe name displayed in the UI | Required | +| `schedule` | `--schedule` | Cron expression for scheduling | None (manual only) | +| `time_zone` | `--time-zone` | Timezone for scheduled runs | `"UTC"` | +| `executor_id` | `--executor-id` | Target executor for ingestion | `"default"` | +| `cli_version` | `--cli-version` | CLI version for ingestion | Server default | + +#### Batch Deployment + +Deploy multiple recipes from version control: ```shell # Deploy every yml recipe in a directory ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy -c {} + +# Deploy with consistent executor across all recipes +ls recipe_directory/*.yml | xargs -n 1 -I {} datahub ingest deploy --executor-id "production-executor" -c {} ``` ### init