mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-27 18:07:57 +00:00
docs(ingestion): glue - clarify that table regex patterns should be fully-qualified (#4747)
This commit is contained in:
parent
a7d76e43b5
commit
4458e6261c
@ -75,33 +75,33 @@ plus `s3:GetObject` for the job script locations.
|
||||
|
||||
Note that a `.` is used to denote nested fields in the YAML recipe.
|
||||
|
||||
| Field | Required | Default | Description |
|
||||
|---------------------------------|----------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `aws_region` | ✅ | | AWS region code. |
|
||||
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
|
||||
| `aws_access_key_id` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
||||
| `aws_secret_access_key` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
||||
| `aws_session_token` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
||||
| `aws_role` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
||||
| `aws_profile` | | | Named AWS profile to use, if not set the default will be used |
|
||||
| `extract_transforms` | | `True` | Whether to extract Glue transform jobs. |
|
||||
| `database_pattern.allow` | | | List of regex patterns for databases to include in ingestion. |
|
||||
| `database_pattern.deny` | | | List of regex patterns for databases to exclude from ingestion. |
|
||||
| `database_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
|
||||
| `table_pattern.allow` | | | List of regex patterns for tables to include in ingestion. |
|
||||
| `table_pattern.deny` | | | List of regex patterns for tables to exclude from ingestion. |
|
||||
| `table_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
|
||||
| `platform` | | `glue` | Override for platform name. Allowed values - `glue`, `athena` |
|
||||
| `platform_instance` | | None | The Platform instance to use while constructing URNs. |
|
||||
| `underlying_platform` | | `glue` | @deprecated(Use `platform`) Override for platform name. Allowed values - `glue`, `athena` |
|
||||
| `ignore_unsupported_connectors` | | `True` | Whether to ignore unsupported connectors. If disabled, an error will be raised. |
|
||||
| `emit_s3_lineage` | | `True` | Whether to emit S3-to-Glue lineage. |
|
||||
| `glue_s3_lineage_direction` | | `upstream` | If `upstream`, S3 is upstream to Glue. If `downstream` S3 is downstream to Glue. |
|
||||
| `extract_owners` | | `True` | When enabled, extracts ownership from Glue directly and overwrites existing owners. When disabled, ownership is left empty for datasets. |
|
||||
| Field | Required | Default | Description |
|
||||
|---------------------------------|----------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `aws_region` | ✅ | | AWS region code. |
|
||||
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
|
||||
| `aws_access_key_id` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
||||
| `aws_secret_access_key` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
||||
| `aws_session_token` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
||||
| `aws_role` | | Autodetected | See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html |
|
||||
| `aws_profile` | | | Named AWS profile to use, if not set the default will be used |
|
||||
| `extract_transforms` | | `True` | Whether to extract Glue transform jobs. |
|
||||
| `database_pattern.allow` | | | List of regex patterns for databases to include in ingestion. |
|
||||
| `database_pattern.deny` | | | List of regex patterns for databases to exclude from ingestion. |
|
||||
| `database_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
|
||||
| `table_pattern.allow` | | | List of regex patterns for fully-qualified table names (in the format `database_name.table_name`) to include in ingestion. |
|
||||
| `table_pattern.deny` | | | List of regex patterns for fully-qualified table names (in the format `database_name.table_name`) to exclude from ingestion. |
|
||||
| `table_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
|
||||
| `platform` | | `glue` | Override for platform name. Allowed values - `glue`, `athena` |
|
||||
| `platform_instance` | | None | The Platform instance to use while constructing URNs. |
|
||||
| `underlying_platform` | | `glue` | @deprecated(Use `platform`) Override for platform name. Allowed values - `glue`, `athena` |
|
||||
| `ignore_unsupported_connectors` | | `True` | Whether to ignore unsupported connectors. If disabled, an error will be raised. |
|
||||
| `emit_s3_lineage` | | `True` | Whether to emit S3-to-Glue lineage. |
|
||||
| `glue_s3_lineage_direction` | | `upstream` | If `upstream`, S3 is upstream to Glue. If `downstream` S3 is downstream to Glue. |
|
||||
| `extract_owners` | | `True` | When enabled, extracts ownership from Glue directly and overwrites existing owners. When disabled, ownership is left empty for datasets. |
|
||||
| `domain.domain_key.allow` | | | List of regex patterns for tables to set domain_key domain key (domain_key can be any string like `sales`. There can be multiple domain key specified. |
|
||||
| `domain.domain_key.deny` | | | List of regex patterns for tables to not assign domain_key. There can be multiple domain key specified. |
|
||||
| `domain.domain_key.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching.There can be multiple domain key specified. |
|
||||
| `catalog_id` | | None | The aws account id where the target glue catalog lives. If None, datahub will ingest glue catalog in aws caller's account. |
|
||||
| `domain.domain_key.deny` | | | List of regex patterns for tables to not assign domain_key. There can be multiple domain key specified. |
|
||||
| `domain.domain_key.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching.There can be multiple domain key specified. |
|
||||
| `catalog_id` | | None | The aws account id where the target glue catalog lives. If None, datahub will ingest glue catalog in aws caller's account. |
|
||||
|
||||
### Cross-account ingestion
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user