[Docs] - Docs storage for manifest & Domo (#13290)

* Domo docs * Storage Manifest
2025-08-23 08:28:10 +00:00 · 2023-09-21 13:06:56 +02:00 · 2023-09-21 13:06:56 +02:00 · c53fe684fd
commit c53fe684fd
parent f05e874c7a
14 changed files with 275 additions and 232 deletions
--- a/openmetadata-docs/content/partials/v1.1/connectors/storage/manifest.md
+++ b/openmetadata-docs/content/partials/v1.1/connectors/storage/manifest.md
@ -0,0 +1,112 @@
 ## OpenMetadata Manifest
 Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json),
 and can look like this:
 {% codePreview %}
 {% codeInfoContainer %}
 {% codeInfo srNumber=1 %}
 **Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level
 one. In this case, we will be ingesting 4 children.
 {% /codeInfo %}
 {% codeInfo srNumber=2 %}
 **Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still
 need to bring information about:
 - **dataPath**: Where we can find the data. This should be a path relative to the top-level container.
 - **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data.
 After ingesting this container, we will bring in the schema of the data in the `dataPath`.
 {% /codeInfo %}
 {% codeInfo srNumber=3 %}
 **Partitioned Container**: We can ingest partitioned data without bringing in any further details.
 By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the
 source files schemas', but won't add any other information.
 {% /codeInfo %}
 {% codeInfo srNumber=4 %}
 **Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition
 is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232)
 definition for table columns. The minimum required information is the `name` and `dataType`.
 When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files.
 {% /codeInfo %}
 {% codeInfo srNumber=5 %}
 **Multiple-Partition Container**: We can add multiple columns as partitions.
 Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type.
 Again, this information will be added on top of the inferred schema from the data files.
 {% /codeInfo %}
 {% /codeInfoContainer %}
 {% codeBlock fileName="openmetadata.json" %}
 ```json {% srNumber=1 %}
 {
    "entries": [
 ```
 ```json {% srNumber=2 %}
        {
            "dataPath": "transactions",
            "structureFormat": "csv"
        },
 ```
 ```json {% srNumber=3 %}
        {
            "dataPath": "cities",
            "structureFormat": "parquet",
            "isPartitioned": true
        },
 ```
 ```json {% srNumber=4 %}
        {
            "dataPath": "cities_multiple_simple",
            "structureFormat": "parquet",
            "isPartitioned": true,
            "partitionColumns": [
                {
                    "name": "State",
                    "dataType": "STRING"
                }
            ]
        },
 ```
 ```json {% srNumber=5 %}
        {
            "dataPath": "cities_multiple",
            "structureFormat": "parquet",
            "isPartitioned": true,
            "partitionColumns": [
                {
                    "name": "Year",
                    "displayName": "Year (Partition)",
                    "dataType": "DATE",
                    "dataTypeDisplay": "date (year)"
                },
                {
                    "name": "State",
                    "dataType": "STRING"
                }
            ]
        }
    ]
 }
 ```
--- a/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md
+++ b/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md
@ -0,0 +1,112 @@
 ## OpenMetadata Manifest
 Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json),
 and can look like this:
 {% codePreview %}
 {% codeInfoContainer %}
 {% codeInfo srNumber=1 %}
 **Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level
 one. In this case, we will be ingesting 4 children.
 {% /codeInfo %}
 {% codeInfo srNumber=2 %}
 **Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still
 need to bring information about:
 - **dataPath**: Where we can find the data. This should be a path relative to the top-level container.
 - **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data.
 After ingesting this container, we will bring in the schema of the data in the `dataPath`.
 {% /codeInfo %}
 {% codeInfo srNumber=3 %}
 **Partitioned Container**: We can ingest partitioned data without bringing in any further details.
 By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the
 source files schemas', but won't add any other information.
 {% /codeInfo %}
 {% codeInfo srNumber=4 %}
 **Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition
 is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232)
 definition for table columns. The minimum required information is the `name` and `dataType`.
 When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files.
 {% /codeInfo %}
 {% codeInfo srNumber=5 %}
 **Multiple-Partition Container**: We can add multiple columns as partitions.
 Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type.
 Again, this information will be added on top of the inferred schema from the data files.
 {% /codeInfo %}
 {% /codeInfoContainer %}
 {% codeBlock fileName="openmetadata.json" %}
 ```json {% srNumber=1 %}
 {
    "entries": [
 ```
 ```json {% srNumber=2 %}
        {
            "dataPath": "transactions",
            "structureFormat": "csv"
        },
 ```
 ```json {% srNumber=3 %}
        {
            "dataPath": "cities",
            "structureFormat": "parquet",
            "isPartitioned": true
        },
 ```
 ```json {% srNumber=4 %}
        {
            "dataPath": "cities_multiple_simple",
            "structureFormat": "parquet",
            "isPartitioned": true,
            "partitionColumns": [
                {
                    "name": "State",
                    "dataType": "STRING"
                }
            ]
        },
 ```
 ```json {% srNumber=5 %}
        {
            "dataPath": "cities_multiple",
            "structureFormat": "parquet",
            "isPartitioned": true,
            "partitionColumns": [
                {
                    "name": "Year",
                    "displayName": "Year (Partition)",
                    "dataType": "DATE",
                    "dataTypeDisplay": "date (year)"
                },
                {
                    "name": "State",
                    "dataType": "STRING"
                }
            ]
        }
    ]
 }
 ```
--- a/openmetadata-docs/content/v1.1.x/connectors/storage/index.md
+++ b/openmetadata-docs/content/v1.1.x/connectors/storage/index.md
@ -40,115 +40,5 @@ We are flattening this structure to simplify the navigation.
 {% /note %}
 ## OpenMetadata Manifest
-Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json),
+{% partial file="/v1.1/connectors/storage/manifest.md" /%}
 and can look like this:
 {% codePreview %}
 {% codeInfoContainer %}
 {% codeInfo srNumber=1 %}
 **Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level
 one. In this case, we will be ingesting 4 children.
 {% /codeInfo %}
 {% codeInfo srNumber=2 %}
 **Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still
 need to bring information about:
 - **dataPath**: Where we can find the data. This should be a path relative to the top-level container.
 - **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data.
 After ingesting this container, we will bring in the schema of the data in the `dataPath`.
 {% /codeInfo %}
 {% codeInfo srNumber=3 %}
 **Partitioned Container**: We can ingest partitioned data without bringing in any further details.
 By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the
 source files schemas', but won't add any other information.
 {% /codeInfo %}
 {% codeInfo srNumber=4 %}
 **Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition
 is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232)
 definition for table columns. The minimum required information is the `name` and `dataType`.
 When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files.
 {% /codeInfo %}
 {% codeInfo srNumber=5 %}
 **Multiple-Partition Container**: We can add multiple columns as partitions.
 Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type.
 Again, this information will be added on top of the inferred schema from the data files.
 {% /codeInfo %}
 {% /codeInfoContainer %}
 {% codeBlock fileName="openmetadata.json" %}
 ```json {% srNumber=1 %}
 {
    "entries": [
 ```
 ```json {% srNumber=2 %}
        {
            "dataPath": "transactions",
            "structureFormat": "csv"
        },
 ```
 ```json {% srNumber=3 %}
        {
            "dataPath": "cities",
            "structureFormat": "parquet",
            "isPartitioned": true
        },
 ```
 ```json {% srNumber=4 %}
        {
            "dataPath": "cities_multiple_simple",
            "structureFormat": "parquet",
            "isPartitioned": true,
            "partitionColumns": [
                {
                    "name": "State",
                    "dataType": "STRING"
                }
            ]
        },
 ```
 ```json {% srNumber=5 %}
        {
            "dataPath": "cities_multiple",
            "structureFormat": "parquet",
            "isPartitioned": true,
            "partitionColumns": [
                {
                    "name": "Year",
                    "displayName": "Year (Partition)",
                    "dataType": "DATE",
                    "dataTypeDisplay": "date (year)"
                },
                {
                    "name": "State",
                    "dataType": "STRING"
                }
            ]
        }
    ]
 }
 ```
--- a/openmetadata-docs/content/v1.1.x/connectors/storage/s3/index.md
+++ b/openmetadata-docs/content/v1.1.x/connectors/storage/s3/index.md
@ -82,6 +82,16 @@ The policy would look like:
 }
 ```
 ### OpenMetadata Manifest
 In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
 metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
 file at the bucket root.
 You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
 {% partial file="/v1.1/connectors/storage/manifest.md" /%}
 ## Metadata Ingestion
 {% stepsContainer %}
--- a/openmetadata-docs/content/v1.1.x/connectors/storage/s3/yaml.md
+++ b/openmetadata-docs/content/v1.1.x/connectors/storage/s3/yaml.md
@ -92,6 +92,16 @@ To run the Athena ingestion, you will need to install:
 pip3 install "openmetadata-ingestion[athena]"
 ```
 ### OpenMetadata Manifest
 In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
 metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
 file at the bucket root.
 You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
 {% partial file="/v1.1/connectors/storage/manifest.md" /%}
 ## Metadata Ingestion
 All connectors are defined as JSON Schemas.
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/index.md
@ -48,7 +48,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/
 - **Secret Token**: Secret Token to Connect DOMO Dashboard.
 - **Access Token**: Access to Connect to DOMO Dashboard.
 - **API Host**:  API Host to Connect to DOMO Dashboard instance.
- **SandBox Domain**: Connect to SandBox Domain.
+- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.
 {% /extraContent %}
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/yaml.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/yaml.md
@ -90,7 +90,7 @@ This is a sample config for Domo-Dashboard:
 {% codeInfo srNumber=5 %}
-**SandBox Domain**: Connect to SandBox Domain.
+**Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.
 {% /codeInfo %}
@ -145,7 +145,7 @@ source:
      apiHost: api.domo.com
 ```
 ```yaml {% srNumber=5 %}
-      sandboxDomain: https://<api_domo>.domo.com
+      instanceDomain: https://<your>.domo.com
 ```
 ```yaml {% srNumber=6 %}
  sourceConfig:
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/index.md
@ -63,7 +63,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/
 - **Secret Token**: Secret Token to Connect DOMO Database.
 - **Access Token**: Access to Connect to DOMO Database.
 - **Api Host**: API Host to Connect to DOMO Database instance.
- **SandBox Domain**: Connect to SandBox Domain.
+- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.
 {% /extraContent %}
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/yaml.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/yaml.md
@ -108,7 +108,7 @@ This is a sample config for DomoDatabase:
 {% codeInfo srNumber=5 %}
-**SandBox Domain**: Connect to SandBox Domain.
+**Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.
 {% /codeInfo %}
@ -186,7 +186,7 @@ source:
      apiHost: api.domo.com
 ```
 ```yaml {% srNumber=5 %}
-      sandboxDomain: https://<api_domo>.domo.com
+      instancexDomain: https://<your>.domo.com
 ```
 ```yaml {% srNumber=6 %}
      # database: database
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/index.md
@ -40,7 +40,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/
 - **Secret Token**: Secret Token to Connect to DOMO Pipeline.
 - **Access Token**: Access to Connect to DOMO Pipeline.
 - **API Host**: API Host to Connect to DOMO Pipeline.
- **SandBox Domain**: Connect to SandBox Domain.
+- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.
 {% /extraContent %}
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/yaml.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/yaml.md
@ -85,7 +85,7 @@ This is a sample config for Domo-Pipeline:
 {% codeInfo srNumber=5 %}
-**SandBox Domain**: Connect to SandBox Domain.
+**Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.
 {% /codeInfo %}
@ -143,7 +143,7 @@ source:
      apiHost: api.domo.com
 ```
 ```yaml {% srNumber=5 %}
-      sandboxDomain: https://<api_domo>.domo.com
+      instanceDomain: https://<your>.domo.com
 ```
 ```yaml {% srNumber=6 %}
  sourceConfig:
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/index.md
@ -40,115 +40,4 @@ We are flattening this structure to simplify the navigation.
 {% /note %}
-## OpenMetadata Manifest
+{% partial file="/v1.2/connectors/storage/manifest.md" /%}
 Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json),
 and can look like this:
 {% codePreview %}
 {% codeInfoContainer %}
 {% codeInfo srNumber=1 %}
 **Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level
 one. In this case, we will be ingesting 4 children.
 {% /codeInfo %}
 {% codeInfo srNumber=2 %}
 **Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still
 need to bring information about:
 - **dataPath**: Where we can find the data. This should be a path relative to the top-level container.
 - **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data.
 After ingesting this container, we will bring in the schema of the data in the `dataPath`.
 {% /codeInfo %}
 {% codeInfo srNumber=3 %}
 **Partitioned Container**: We can ingest partitioned data without bringing in any further details.
 By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the
 source files schemas', but won't add any other information.
 {% /codeInfo %}
 {% codeInfo srNumber=4 %}
 **Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition
 is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232)
 definition for table columns. The minimum required information is the `name` and `dataType`.
 When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files.
 {% /codeInfo %}
 {% codeInfo srNumber=5 %}
 **Multiple-Partition Container**: We can add multiple columns as partitions.
 Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type.
 Again, this information will be added on top of the inferred schema from the data files.
 {% /codeInfo %}
 {% /codeInfoContainer %}
 {% codeBlock fileName="openmetadata.json" %}
 ```json {% srNumber=1 %}
 {
    "entries": [
 ```
 ```json {% srNumber=2 %}
        {
            "dataPath": "transactions",
            "structureFormat": "csv"
        },
 ```
 ```json {% srNumber=3 %}
        {
            "dataPath": "cities",
            "structureFormat": "parquet",
            "isPartitioned": true
        },
 ```
 ```json {% srNumber=4 %}
        {
            "dataPath": "cities_multiple_simple",
            "structureFormat": "parquet",
            "isPartitioned": true,
            "partitionColumns": [
                {
                    "name": "State",
                    "dataType": "STRING"
                }
            ]
        },
 ```
 ```json {% srNumber=5 %}
        {
            "dataPath": "cities_multiple",
            "structureFormat": "parquet",
            "isPartitioned": true,
            "partitionColumns": [
                {
                    "name": "Year",
                    "displayName": "Year (Partition)",
                    "dataType": "DATE",
                    "dataTypeDisplay": "date (year)"
                },
                {
                    "name": "State",
                    "dataType": "STRING"
                }
            ]
        }
    ]
 }
 ```
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/index.md
@ -82,6 +82,16 @@ The policy would look like:
 }
 ```
 ### OpenMetadata Manifest
 In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
 metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
 file at the bucket root.
 You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
 {% partial file="/v1.2/connectors/storage/manifest.md" /%}
 ## Metadata Ingestion
 {% stepsContainer %}
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/yaml.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/yaml.md
@ -92,6 +92,16 @@ To run the Athena ingestion, you will need to install:
 pip3 install "openmetadata-ingestion[athena]"
 ```
 ### OpenMetadata Manifest
 In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
 metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
 file at the bucket root.
 You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
 {% partial file="/v1.2/connectors/storage/manifest.md" /%}
 ## Metadata Ingestion
 All connectors are defined as JSON Schemas.