diff --git a/openmetadata-docs/content/partials/v1.1/connectors/storage/manifest.md b/openmetadata-docs/content/partials/v1.1/connectors/storage/manifest.md new file mode 100644 index 00000000000..9c09731af80 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.1/connectors/storage/manifest.md @@ -0,0 +1,112 @@ +## OpenMetadata Manifest + +Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json), +and can look like this: + +{% codePreview %} + +{% codeInfoContainer %} + +{% codeInfo srNumber=1 %} + +**Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level +one. In this case, we will be ingesting 4 children. + +{% /codeInfo %} + +{% codeInfo srNumber=2 %} + +**Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still +need to bring information about: + +- **dataPath**: Where we can find the data. This should be a path relative to the top-level container. +- **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data. + +After ingesting this container, we will bring in the schema of the data in the `dataPath`. + +{% /codeInfo %} + +{% codeInfo srNumber=3 %} + +**Partitioned Container**: We can ingest partitioned data without bringing in any further details. + +By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the +source files schemas', but won't add any other information. + +{% /codeInfo %} + +{% codeInfo srNumber=4 %} + +**Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition +is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232) +definition for table columns. The minimum required information is the `name` and `dataType`. + +When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files. + +{% /codeInfo %} + +{% codeInfo srNumber=5 %} + +**Multiple-Partition Container**: We can add multiple columns as partitions. + +Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type. + +Again, this information will be added on top of the inferred schema from the data files. + +{% /codeInfo %} + +{% /codeInfoContainer %} + +{% codeBlock fileName="openmetadata.json" %} + +```json {% srNumber=1 %} +{ + "entries": [ +``` +```json {% srNumber=2 %} + { + "dataPath": "transactions", + "structureFormat": "csv" + }, +``` +```json {% srNumber=3 %} + { + "dataPath": "cities", + "structureFormat": "parquet", + "isPartitioned": true + }, +``` +```json {% srNumber=4 %} + { + "dataPath": "cities_multiple_simple", + "structureFormat": "parquet", + "isPartitioned": true, + "partitionColumns": [ + { + "name": "State", + "dataType": "STRING" + } + ] + }, +``` +```json {% srNumber=5 %} + { + "dataPath": "cities_multiple", + "structureFormat": "parquet", + "isPartitioned": true, + "partitionColumns": [ + { + "name": "Year", + "displayName": "Year (Partition)", + "dataType": "DATE", + "dataTypeDisplay": "date (year)" + }, + { + "name": "State", + "dataType": "STRING" + } + ] + } + ] +} +``` diff --git a/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md b/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md new file mode 100644 index 00000000000..9c09731af80 --- /dev/null +++ b/openmetadata-docs/content/partials/v1.2/connectors/storage/manifest.md @@ -0,0 +1,112 @@ +## OpenMetadata Manifest + +Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json), +and can look like this: + +{% codePreview %} + +{% codeInfoContainer %} + +{% codeInfo srNumber=1 %} + +**Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level +one. In this case, we will be ingesting 4 children. + +{% /codeInfo %} + +{% codeInfo srNumber=2 %} + +**Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still +need to bring information about: + +- **dataPath**: Where we can find the data. This should be a path relative to the top-level container. +- **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data. + +After ingesting this container, we will bring in the schema of the data in the `dataPath`. + +{% /codeInfo %} + +{% codeInfo srNumber=3 %} + +**Partitioned Container**: We can ingest partitioned data without bringing in any further details. + +By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the +source files schemas', but won't add any other information. + +{% /codeInfo %} + +{% codeInfo srNumber=4 %} + +**Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition +is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232) +definition for table columns. The minimum required information is the `name` and `dataType`. + +When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files. + +{% /codeInfo %} + +{% codeInfo srNumber=5 %} + +**Multiple-Partition Container**: We can add multiple columns as partitions. + +Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type. + +Again, this information will be added on top of the inferred schema from the data files. + +{% /codeInfo %} + +{% /codeInfoContainer %} + +{% codeBlock fileName="openmetadata.json" %} + +```json {% srNumber=1 %} +{ + "entries": [ +``` +```json {% srNumber=2 %} + { + "dataPath": "transactions", + "structureFormat": "csv" + }, +``` +```json {% srNumber=3 %} + { + "dataPath": "cities", + "structureFormat": "parquet", + "isPartitioned": true + }, +``` +```json {% srNumber=4 %} + { + "dataPath": "cities_multiple_simple", + "structureFormat": "parquet", + "isPartitioned": true, + "partitionColumns": [ + { + "name": "State", + "dataType": "STRING" + } + ] + }, +``` +```json {% srNumber=5 %} + { + "dataPath": "cities_multiple", + "structureFormat": "parquet", + "isPartitioned": true, + "partitionColumns": [ + { + "name": "Year", + "displayName": "Year (Partition)", + "dataType": "DATE", + "dataTypeDisplay": "date (year)" + }, + { + "name": "State", + "dataType": "STRING" + } + ] + } + ] +} +``` diff --git a/openmetadata-docs/content/v1.1.x/connectors/storage/index.md b/openmetadata-docs/content/v1.1.x/connectors/storage/index.md index 712af24a715..2e5514d2a47 100644 --- a/openmetadata-docs/content/v1.1.x/connectors/storage/index.md +++ b/openmetadata-docs/content/v1.1.x/connectors/storage/index.md @@ -40,115 +40,5 @@ We are flattening this structure to simplify the navigation. {% /note %} -## OpenMetadata Manifest -Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json), -and can look like this: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=1 %} - -**Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level -one. In this case, we will be ingesting 4 children. - -{% /codeInfo %} - -{% codeInfo srNumber=2 %} - -**Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still -need to bring information about: - -- **dataPath**: Where we can find the data. This should be a path relative to the top-level container. -- **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data. - -After ingesting this container, we will bring in the schema of the data in the `dataPath`. - -{% /codeInfo %} - -{% codeInfo srNumber=3 %} - -**Partitioned Container**: We can ingest partitioned data without bringing in any further details. - -By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the -source files schemas', but won't add any other information. - -{% /codeInfo %} - -{% codeInfo srNumber=4 %} - -**Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition -is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232) -definition for table columns. The minimum required information is the `name` and `dataType`. - -When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files. - -{% /codeInfo %} - -{% codeInfo srNumber=5 %} - -**Multiple-Partition Container**: We can add multiple columns as partitions. - -Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type. - -Again, this information will be added on top of the inferred schema from the data files. - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="openmetadata.json" %} - -```json {% srNumber=1 %} -{ - "entries": [ -``` -```json {% srNumber=2 %} - { - "dataPath": "transactions", - "structureFormat": "csv" - }, -``` -```json {% srNumber=3 %} - { - "dataPath": "cities", - "structureFormat": "parquet", - "isPartitioned": true - }, -``` -```json {% srNumber=4 %} - { - "dataPath": "cities_multiple_simple", - "structureFormat": "parquet", - "isPartitioned": true, - "partitionColumns": [ - { - "name": "State", - "dataType": "STRING" - } - ] - }, -``` -```json {% srNumber=5 %} - { - "dataPath": "cities_multiple", - "structureFormat": "parquet", - "isPartitioned": true, - "partitionColumns": [ - { - "name": "Year", - "displayName": "Year (Partition)", - "dataType": "DATE", - "dataTypeDisplay": "date (year)" - }, - { - "name": "State", - "dataType": "STRING" - } - ] - } - ] -} -``` +{% partial file="/v1.1/connectors/storage/manifest.md" /%} diff --git a/openmetadata-docs/content/v1.1.x/connectors/storage/s3/index.md b/openmetadata-docs/content/v1.1.x/connectors/storage/s3/index.md index 72f304770ce..e985ff06c82 100644 --- a/openmetadata-docs/content/v1.1.x/connectors/storage/s3/index.md +++ b/openmetadata-docs/content/v1.1.x/connectors/storage/s3/index.md @@ -82,6 +82,16 @@ The policy would look like: } ``` +### OpenMetadata Manifest + +In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level +metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json` +file at the bucket root. + +You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file. + +{% partial file="/v1.1/connectors/storage/manifest.md" /%} + ## Metadata Ingestion {% stepsContainer %} diff --git a/openmetadata-docs/content/v1.1.x/connectors/storage/s3/yaml.md b/openmetadata-docs/content/v1.1.x/connectors/storage/s3/yaml.md index f7d0c1461b4..c2a019b3538 100644 --- a/openmetadata-docs/content/v1.1.x/connectors/storage/s3/yaml.md +++ b/openmetadata-docs/content/v1.1.x/connectors/storage/s3/yaml.md @@ -92,6 +92,16 @@ To run the Athena ingestion, you will need to install: pip3 install "openmetadata-ingestion[athena]" ``` +### OpenMetadata Manifest + +In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level +metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json` +file at the bucket root. + +You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file. + +{% partial file="/v1.1/connectors/storage/manifest.md" /%} + ## Metadata Ingestion All connectors are defined as JSON Schemas. diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/index.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/index.md index 9c1c7f3ba9e..afad883f1d8 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/index.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/index.md @@ -48,7 +48,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/ - **Secret Token**: Secret Token to Connect DOMO Dashboard. - **Access Token**: Access to Connect to DOMO Dashboard. - **API Host**: API Host to Connect to DOMO Dashboard instance. -- **SandBox Domain**: Connect to SandBox Domain. +- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://.domo.com`. {% /extraContent %} diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/yaml.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/yaml.md index 1b953bf86a6..beeeae789bf 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/yaml.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/dashboard/domo-dashboard/yaml.md @@ -90,7 +90,7 @@ This is a sample config for Domo-Dashboard: {% codeInfo srNumber=5 %} -**SandBox Domain**: Connect to SandBox Domain. +**Instance Domain**: URL to connect to your Domo instance UI. For example `https://.domo.com`. {% /codeInfo %} @@ -145,7 +145,7 @@ source: apiHost: api.domo.com ``` ```yaml {% srNumber=5 %} - sandboxDomain: https://.domo.com + instanceDomain: https://.domo.com ``` ```yaml {% srNumber=6 %} sourceConfig: diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/index.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/index.md index bc42ec82875..e6a22e3aeb2 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/index.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/index.md @@ -63,7 +63,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/ - **Secret Token**: Secret Token to Connect DOMO Database. - **Access Token**: Access to Connect to DOMO Database. - **Api Host**: API Host to Connect to DOMO Database instance. -- **SandBox Domain**: Connect to SandBox Domain. +- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://.domo.com`. {% /extraContent %} diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/yaml.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/yaml.md index d415ca74800..731d683f6c2 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/yaml.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/database/domo-database/yaml.md @@ -108,7 +108,7 @@ This is a sample config for DomoDatabase: {% codeInfo srNumber=5 %} -**SandBox Domain**: Connect to SandBox Domain. +**Instance Domain**: URL to connect to your Domo instance UI. For example `https://.domo.com`. {% /codeInfo %} @@ -186,7 +186,7 @@ source: apiHost: api.domo.com ``` ```yaml {% srNumber=5 %} - sandboxDomain: https://.domo.com + instancexDomain: https://.domo.com ``` ```yaml {% srNumber=6 %} # database: database diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/index.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/index.md index 353577382a0..17e4049a3b0 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/index.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/index.md @@ -40,7 +40,7 @@ For questions related to scopes, click [here](https://developer.domo.com/portal/ - **Secret Token**: Secret Token to Connect to DOMO Pipeline. - **Access Token**: Access to Connect to DOMO Pipeline. - **API Host**: API Host to Connect to DOMO Pipeline. -- **SandBox Domain**: Connect to SandBox Domain. +- **Instance Domain**: URL to connect to your Domo instance UI. For example `https://.domo.com`. {% /extraContent %} diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/yaml.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/yaml.md index 61b2354589a..ff043bb19f9 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/yaml.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/pipeline/domo-pipeline/yaml.md @@ -85,7 +85,7 @@ This is a sample config for Domo-Pipeline: {% codeInfo srNumber=5 %} -**SandBox Domain**: Connect to SandBox Domain. +**Instance Domain**: URL to connect to your Domo instance UI. For example `https://.domo.com`. {% /codeInfo %} @@ -143,7 +143,7 @@ source: apiHost: api.domo.com ``` ```yaml {% srNumber=5 %} - sandboxDomain: https://.domo.com + instanceDomain: https://.domo.com ``` ```yaml {% srNumber=6 %} sourceConfig: diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/index.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/index.md index 712af24a715..89b71870398 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/index.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/index.md @@ -40,115 +40,4 @@ We are flattening this structure to simplify the navigation. {% /note %} -## OpenMetadata Manifest - -Our manifest file is defined as a [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/storage/containerMetadataConfig.json), -and can look like this: - -{% codePreview %} - -{% codeInfoContainer %} - -{% codeInfo srNumber=1 %} - -**Entries**: We need to add a list of `entries`. Each inner JSON structure will be ingested as a child container of the top-level -one. In this case, we will be ingesting 4 children. - -{% /codeInfo %} - -{% codeInfo srNumber=2 %} - -**Simple Container**: The simplest container we can have would be structured, but without partitions. Note that we still -need to bring information about: - -- **dataPath**: Where we can find the data. This should be a path relative to the top-level container. -- **structureFormat**: What is the format of the data we are going to find. This information will be used to read the data. - -After ingesting this container, we will bring in the schema of the data in the `dataPath`. - -{% /codeInfo %} - -{% codeInfo srNumber=3 %} - -**Partitioned Container**: We can ingest partitioned data without bringing in any further details. - -By informing the `isPartitioned` field as `true`, we'll flag the container as `Partitioned`. We will be reading the -source files schemas', but won't add any other information. - -{% /codeInfo %} - -{% codeInfo srNumber=4 %} - -**Single-Partition Container**: We can bring partition information by specifying the `partitionColumns`. Their definition -is based on the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L232) -definition for table columns. The minimum required information is the `name` and `dataType`. - -When passing `partitionColumns`, these values will be added to the schema, on top of the inferred information from the files. - -{% /codeInfo %} - -{% codeInfo srNumber=5 %} - -**Multiple-Partition Container**: We can add multiple columns as partitions. - -Note how in the example we even bring our custom `displayName` for the column `dataTypeDisplay` for its type. - -Again, this information will be added on top of the inferred schema from the data files. - -{% /codeInfo %} - -{% /codeInfoContainer %} - -{% codeBlock fileName="openmetadata.json" %} - -```json {% srNumber=1 %} -{ - "entries": [ -``` -```json {% srNumber=2 %} - { - "dataPath": "transactions", - "structureFormat": "csv" - }, -``` -```json {% srNumber=3 %} - { - "dataPath": "cities", - "structureFormat": "parquet", - "isPartitioned": true - }, -``` -```json {% srNumber=4 %} - { - "dataPath": "cities_multiple_simple", - "structureFormat": "parquet", - "isPartitioned": true, - "partitionColumns": [ - { - "name": "State", - "dataType": "STRING" - } - ] - }, -``` -```json {% srNumber=5 %} - { - "dataPath": "cities_multiple", - "structureFormat": "parquet", - "isPartitioned": true, - "partitionColumns": [ - { - "name": "Year", - "displayName": "Year (Partition)", - "dataType": "DATE", - "dataTypeDisplay": "date (year)" - }, - { - "name": "State", - "dataType": "STRING" - } - ] - } - ] -} -``` +{% partial file="/v1.2/connectors/storage/manifest.md" /%} diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/index.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/index.md index 33f32d8aed9..78243cd3019 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/index.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/index.md @@ -82,6 +82,16 @@ The policy would look like: } ``` +### OpenMetadata Manifest + +In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level +metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json` +file at the bucket root. + +You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file. + +{% partial file="/v1.2/connectors/storage/manifest.md" /%} + ## Metadata Ingestion {% stepsContainer %} diff --git a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/yaml.md b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/yaml.md index c930a235b24..3c49d9a75e4 100644 --- a/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/yaml.md +++ b/openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/storage/s3/yaml.md @@ -92,6 +92,16 @@ To run the Athena ingestion, you will need to install: pip3 install "openmetadata-ingestion[athena]" ``` +### OpenMetadata Manifest + +In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level +metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json` +file at the bucket root. + +You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file. + +{% partial file="/v1.2/connectors/storage/manifest.md" /%} + ## Metadata Ingestion All connectors are defined as JSON Schemas.