[Docs] - Docs storage for manifest & Domo (#13290)

* Domo docs * Storage Manifest
2025-10-12 17:26:43 +00:00 · 2023-09-21 13:06:56 +02:00 · 2023-09-21 13:06:56 +02:00 · c53fe684fd
commit c53fe684fd
parent f05e874c7a
14 changed files with 275 additions and 232 deletions
--- a/openmetadata-docs/content/partials/v1.1/connectors/storage/manifest.md
+++ b/openmetadata-docs/content/partials/v1.1/connectors/storage/manifest.md
@ -0,0 +1,112 @@
+## OpenMetadata Manifest
+
+Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json),
+and can look like this:
+
+{% codePreview %}
+
+{% codeInfoContainer %}
+
+{% codeInfo srNumber=1 %}
+
+**Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level
+one. In this case, we will be ingesting 4 children.
+
+{% /codeInfo %}
+
+{% codeInfo srNumber=2 %}
+
+**Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still
+need to bring information about:
+
+- **dataPath**: Where we can find the data. This should be a path relative to the top-level container.
+- **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data.
+
+After ingesting this container, we will bring in the schema of the data in the `dataPath`.
+
+{% /codeInfo %}
+
+{% codeInfo srNumber=3 %}
+
+**Partitioned Container**: We can ingest partitioned data without bringing in any further details.
+
+By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the
+source files schemas', but won't add any other information.
+
+{% /codeInfo %}
+
+{% codeInfo srNumber=4 %}
+
+**Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition
+is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232)
+definition for table columns. The minimum required information is the `name` and `dataType`.
+
+When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files.
+
+{% /codeInfo %}
+
+{% codeInfo srNumber=5 %}
+
+**Multiple-Partition Container**: We can add multiple columns as partitions.
+
+Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type.
+
+Again, this information will be added on top of the inferred schema from the data files.
+
+{% /codeInfo %}
+
+{% /codeInfoContainer %}
+
+{% codeBlock fileName="openmetadata.json" %}
+
+```json {% srNumber=1 %}
+{
+    "entries": [
+```
+```json {% srNumber=2 %}
+        {
+            "dataPath": "transactions",
+            "structureFormat": "csv"
+        },
+```
+```json {% srNumber=3 %}
+        {
+            "dataPath": "cities",
+            "structureFormat": "parquet",
+            "isPartitioned": true
+        },
+```
+```json {% srNumber=4 %}
+        {
+            "dataPath": "cities_multiple_simple",
+            "structureFormat": "parquet",
+            "isPartitioned": true,
+            "partitionColumns": [
+                {
+                    "name": "State",
+                    "dataType": "STRING"
+                }
+            ]
+        },
+```
+```json {% srNumber=5 %}
+        {
+            "dataPath": "cities_multiple",
+            "structureFormat": "parquet",
+            "isPartitioned": true,
+            "partitionColumns": [
+                {
+                    "name": "Year",
+                    "displayName": "Year (Partition)",
+                    "dataType": "DATE",
+                    "dataTypeDisplay": "date (year)"
+                },
+                {
+                    "name": "State",
+                    "dataType": "STRING"
+                }
+            ]
+        }
+    ]
+}
+```
--- a/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md
+++ b/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md
@ -0,0 +1,112 @@
+## OpenMetadata Manifest
+
+Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json),
+and can look like this:
+
+{% codePreview %}
+
+{% codeInfoContainer %}
+
+{% codeInfo srNumber=1 %}
+
+**Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level
+one. In this case, we will be ingesting 4 children.
+
+{% /codeInfo %}
+
+{% codeInfo srNumber=2 %}
+
+**Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still
+need to bring information about:
+
+- **dataPath**: Where we can find the data. This should be a path relative to the top-level container.
+- **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data.
+
+After ingesting this container, we will bring in the schema of the data in the `dataPath`.
+
+{% /codeInfo %}
+
+{% codeInfo srNumber=3 %}
+
+**Partitioned Container**: We can ingest partitioned data without bringing in any further details.
+
+By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the
+source files schemas', but won't add any other information.
+
+{% /codeInfo %}
+
+{% codeInfo srNumber=4 %}
+
+**Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition
+is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232)
+definition for table columns. The minimum required information is the `name` and `dataType`.
+
+When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files.
+
+{% /codeInfo %}
+
+{% codeInfo srNumber=5 %}
+
+**Multiple-Partition Container**: We can add multiple columns as partitions.
+
+Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type.
+
+Again, this information will be added on top of the inferred schema from the data files.
+
+{% /codeInfo %}
+
+{% /codeInfoContainer %}
+
+{% codeBlock fileName="openmetadata.json" %}
+
+```json {% srNumber=1 %}
+{
+    "entries": [
+```
+```json {% srNumber=2 %}
+        {
+            "dataPath": "transactions",
+            "structureFormat": "csv"
+        },
+```
+```json {% srNumber=3 %}
+        {
+            "dataPath": "cities",
+            "structureFormat": "parquet",
+            "isPartitioned": true
+        },
+```
+```json {% srNumber=4 %}
+        {
+            "dataPath": "cities_multiple_simple",
+            "structureFormat": "parquet",
+            "isPartitioned": true,
+            "partitionColumns": [
+                {
+                    "name": "State",
+                    "dataType": "STRING"
+                }
+            ]
+        },
+```
+```json {% srNumber=5 %}
+        {
+            "dataPath": "cities_multiple",
+            "structureFormat": "parquet",
+            "isPartitioned": true,
+            "partitionColumns": [
+                {
+                    "name": "Year",
+                    "displayName": "Year (Partition)",
+                    "dataType": "DATE",
+                    "dataTypeDisplay": "date (year)"
+                },
+                {
+                    "name": "State",
+                    "dataType": "STRING"
+                }
+            ]
+        }
+    ]
+}
+```
--- a/openmetadata-docs/content/v1.1.x/connectors/storage/index.md
+++ b/openmetadata-docs/content/v1.1.x/connectors/storage/index.md
@ -40,115 +40,5 @@ We are flattening this structure to simplify the navigation.

 {% /note %}

-## OpenMetadata Manifest

-Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json),
-and can look like this:
-
-{% codePreview %}
-
-{% codeInfoContainer %}
-
-{% codeInfo srNumber=1 %}
-
-**Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level
-one. In this case, we will be ingesting 4 children.
-
-{% /codeInfo %}
-
-{% codeInfo srNumber=2 %}
-
-**Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still
-need to bring information about:
-
- **dataPath**: Where we can find the data. This should be a path relative to the top-level container.
- **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data.
-
-After ingesting this container, we will bring in the schema of the data in the `dataPath`.
-
-{% /codeInfo %}
-
-{% codeInfo srNumber=3 %}
-
-**Partitioned Container**: We can ingest partitioned data without bringing in any further details.
-
-By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the
-source files schemas', but won't add any other information.
-
-{% /codeInfo %}
-
-{% codeInfo srNumber=4 %}
-
-**Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition
-is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232)
-definition for table columns. The minimum required information is the `name` and `dataType`.
-
-When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files.
-
-{% /codeInfo %}
-
-{% codeInfo srNumber=5 %}
-
-**Multiple-Partition Container**: We can add multiple columns as partitions.
-
-Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type.
-
-Again, this information will be added on top of the inferred schema from the data files.
-
-{% /codeInfo %}
-
-{% /codeInfoContainer %}
-
-{% codeBlock fileName="openmetadata.json" %}
-
-```json {% srNumber=1 %}
-{
-    "entries": [
-```
-```json {% srNumber=2 %}
-        {
-            "dataPath": "transactions",
-            "structureFormat": "csv"
-        },
-```
-```json {% srNumber=3 %}
-        {
-            "dataPath": "cities",
-            "structureFormat": "parquet",
-            "isPartitioned": true
-        },
-```
-```json {% srNumber=4 %}
-        {
-            "dataPath": "cities_multiple_simple",
-            "structureFormat": "parquet",
-            "isPartitioned": true,
-            "partitionColumns": [
-                {
-                    "name": "State",
-                    "dataType": "STRING"
-                }
-            ]
-        },
-```
-```json {% srNumber=5 %}
-        {
-            "dataPath": "cities_multiple",
-            "structureFormat": "parquet",
-            "isPartitioned": true,
-            "partitionColumns": [
-                {
-                    "name": "Year",
-                    "displayName": "Year (Partition)",
-                    "dataType": "DATE",
-                    "dataTypeDisplay": "date (year)"
-                },
-                {
-                    "name": "State",
-                    "dataType": "STRING"
-                }
-            ]
-        }
-    ]
-}
-```
+{% partial file="/v1.1/connectors/storage/manifest.md" /%}
--- a/openmetadata-docs/content/v1.1.x/connectors/storage/s3/index.md
+++ b/openmetadata-docs/content/v1.1.x/connectors/storage/s3/index.md
@ -82,6 +82,16 @@ The policy would look like:
 }
 ```

+### OpenMetadata Manifest
+
+In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
+metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
+file at the bucket root.
+
+You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
+
+{% partial file="/v1.1/connectors/storage/manifest.md" /%}
+
 ## Metadata Ingestion

 {% stepsContainer %}
--- a/openmetadata-docs/content/v1.1.x/connectors/storage/s3/yaml.md
+++ b/openmetadata-docs/content/v1.1.x/connectors/storage/s3/yaml.md
@ -92,6 +92,16 @@ To run the Athena ingestion, you will need to install:
 pip3 install "openmetadata-ingestion[athena]"
 ```

+### OpenMetadata Manifest
+
+In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
+metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
+file at the bucket root.
+
+You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
+
+{% partial file="/v1.1/connectors/storage/manifest.md" /%}
+
 ## Metadata Ingestion

 All connectors are defined as JSON Schemas.
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/index.md
@ -48,7 +48,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/
 - **Secret Token**: Secret Token to Connect DOMO Dashboard.
 - **Access Token**: Access to Connect to DOMO Dashboard.
 - **API Host**:  API Host to Connect to DOMO Dashboard instance.
- **SandBox Domain**: Connect to SandBox Domain.
+- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.

 {% /extraContent %}

--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/yaml.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/yaml.md
@ -90,7 +90,7 @@ This is a sample config for Domo-Dashboard:

 {% codeInfo srNumber=5 %}

-**SandBox Domain**: Connect to SandBox Domain.
+**Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.

 {% /codeInfo %}

@ -145,7 +145,7 @@ source:
      apiHost: api.domo.com
 ```
 ```yaml {% srNumber=5 %}
-      sandboxDomain: https://<api_domo>.domo.com
+      instanceDomain: https://<your>.domo.com
 ```
 ```yaml {% srNumber=6 %}
  sourceConfig:
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/index.md
@ -63,7 +63,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/
 - **Secret Token**: Secret Token to Connect DOMO Database.
 - **Access Token**: Access to Connect to DOMO Database.
 - **Api Host**: API Host to Connect to DOMO Database instance.
- **SandBox Domain**: Connect to SandBox Domain.
+- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.

 {% /extraContent %}

--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/yaml.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/yaml.md
@ -108,7 +108,7 @@ This is a sample config for DomoDatabase:

 {% codeInfo srNumber=5 %}

-**SandBox Domain**: Connect to SandBox Domain.
+**Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.

 {% /codeInfo %}

@ -186,7 +186,7 @@ source:
      apiHost: api.domo.com
 ```
 ```yaml {% srNumber=5 %}
-      sandboxDomain: https://<api_domo>.domo.com
+      instancexDomain: https://<your>.domo.com
 ```
 ```yaml {% srNumber=6 %}
      # database: database
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/index.md
@ -40,7 +40,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/
 - **Secret Token**: Secret Token to Connect to DOMO Pipeline.
 - **Access Token**: Access to Connect to DOMO Pipeline.
 - **API Host**: API Host to Connect to DOMO Pipeline.
- **SandBox Domain**: Connect to SandBox Domain.
+- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.

 {% /extraContent %}

--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/yaml.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/yaml.md
@ -85,7 +85,7 @@ This is a sample config for Domo-Pipeline:

 {% codeInfo srNumber=5 %}

-**SandBox Domain**: Connect to SandBox Domain.
+**Instance Domain**: URL to connect to your Domo instance UI. For example `https://<your>.domo.com`.


 {% /codeInfo %}
@ -143,7 +143,7 @@ source:
      apiHost: api.domo.com
 ```
 ```yaml {% srNumber=5 %}
-      sandboxDomain: https://<api_domo>.domo.com
+      instanceDomain: https://<your>.domo.com
 ```
 ```yaml {% srNumber=6 %}
  sourceConfig:
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/index.md
@ -40,115 +40,4 @@ We are flattening this structure to simplify the navigation.

 {% /note %}

-## OpenMetadata Manifest
-
-Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json),
-and can look like this:
-
-{% codePreview %}
-
-{% codeInfoContainer %}
-
-{% codeInfo srNumber=1 %}
-
-**Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level
-one. In this case, we will be ingesting 4 children.
-
-{% /codeInfo %}
-
-{% codeInfo srNumber=2 %}
-
-**Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still
-need to bring information about:
-
- **dataPath**: Where we can find the data. This should be a path relative to the top-level container.
- **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data.
-
-After ingesting this container, we will bring in the schema of the data in the `dataPath`.
-
-{% /codeInfo %}
-
-{% codeInfo srNumber=3 %}
-
-**Partitioned Container**: We can ingest partitioned data without bringing in any further details.
-
-By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the
-source files schemas', but won't add any other information.
-
-{% /codeInfo %}
-
-{% codeInfo srNumber=4 %}
-
-**Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition
-is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232)
-definition for table columns. The minimum required information is the `name` and `dataType`.
-
-When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files.
-
-{% /codeInfo %}
-
-{% codeInfo srNumber=5 %}
-
-**Multiple-Partition Container**: We can add multiple columns as partitions.
-
-Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type.
-
-Again, this information will be added on top of the inferred schema from the data files.
-
-{% /codeInfo %}
-
-{% /codeInfoContainer %}
-
-{% codeBlock fileName="openmetadata.json" %}
-
-```json {% srNumber=1 %}
-{
-    "entries": [
-```
-```json {% srNumber=2 %}
-        {
-            "dataPath": "transactions",
-            "structureFormat": "csv"
-        },
-```
-```json {% srNumber=3 %}
-        {
-            "dataPath": "cities",
-            "structureFormat": "parquet",
-            "isPartitioned": true
-        },
-```
-```json {% srNumber=4 %}
-        {
-            "dataPath": "cities_multiple_simple",
-            "structureFormat": "parquet",
-            "isPartitioned": true,
-            "partitionColumns": [
-                {
-                    "name": "State",
-                    "dataType": "STRING"
-                }
-            ]
-        },
-```
-```json {% srNumber=5 %}
-        {
-            "dataPath": "cities_multiple",
-            "structureFormat": "parquet",
-            "isPartitioned": true,
-            "partitionColumns": [
-                {
-                    "name": "Year",
-                    "displayName": "Year (Partition)",
-                    "dataType": "DATE",
-                    "dataTypeDisplay": "date (year)"
-                },
-                {
-                    "name": "State",
-                    "dataType": "STRING"
-                }
-            ]
-        }
-    ]
-}
-```
+{% partial file="/v1.2/connectors/storage/manifest.md" /%}
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/index.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/index.md
@ -82,6 +82,16 @@ The policy would look like:
 }
 ```

+### OpenMetadata Manifest
+
+In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
+metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
+file at the bucket root.
+
+You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
+
+{% partial file="/v1.2/connectors/storage/manifest.md" /%}
+
 ## Metadata Ingestion

 {% stepsContainer %}
--- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/yaml.md
+++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/yaml.md
@ -92,6 +92,16 @@ To run the Athena ingestion, you will need to install:
 pip3 install "openmetadata-ingestion[athena]"
 ```

+### OpenMetadata Manifest
+
+In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
+metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
+file at the bucket root.
+
+You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
+
+{% partial file="/v1.2/connectors/storage/manifest.md" /%}
+
 ## Metadata Ingestion

 All connectors are defined as JSON Schemas.