mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-08-21 23:48:47 +00:00
MINOR: Add / Fix GCS and ADLS - docs, bugs (#15502)
Add GCS and ADLS docs
This commit is contained in:
parent
189e0b82d0
commit
1c2fbdd9f4
@ -164,7 +164,8 @@ We support two ways of authenticating to GCS:
|
||||
- **Client ID** : Client ID of the data storage account
|
||||
- **Client Secret** : Client Secret of the account
|
||||
- **Tenant ID** : Tenant ID under which the data storage account falls
|
||||
- **Account Name** : Account Name of the data Storage
|
||||
- **Account Name** : Account Name of the Data Storage
|
||||
- **Key Vault Name**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
|
||||
|
||||
- **Required Roles**
|
||||
|
||||
|
@ -239,7 +239,8 @@ source:
|
||||
- **Client ID** : Client ID of the data storage account
|
||||
- **Client Secret** : Client Secret of the account
|
||||
- **Tenant ID** : Tenant ID under which the data storage account falls
|
||||
- **Account Name** : Account Name of the data Storage
|
||||
- **Account Name** : Account Name of the Data Storage
|
||||
- **Key Vault Name**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
|
@ -183,7 +183,9 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la
|
||||
|
||||
- **Tenant ID** : Tenant ID under which the data storage account falls
|
||||
|
||||
- **Account Name** : Account Name of the data Storage
|
||||
- **Account Name** : Account Name of the Data Storage
|
||||
|
||||
- **Key Vault Name**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
|
||||
|
||||
{% /extraContent %}
|
||||
|
||||
|
@ -268,7 +268,9 @@ source:
|
||||
* **clientId** : Client ID of the data storage account
|
||||
* **clientSecret** : Client Secret of the account
|
||||
* **tenantId** : Tenant ID under which the data storage account falls
|
||||
* **accountName** : Account Name of the data Storage
|
||||
* **accountName** : Account Name of the Data Storage
|
||||
* **vaultName**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
|
||||
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
@ -407,7 +409,8 @@ source:
|
||||
* **clientId** : Client ID of the data storage account
|
||||
* **clientSecret** : Client Secret of the account
|
||||
* **tenantId** : Tenant ID under which the data storage account falls
|
||||
* **accountName** : Account Name of the data Storage
|
||||
* **accountName** : Account Name of the Data Storage
|
||||
* **vaultName**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
|
@ -0,0 +1,182 @@
|
||||
---
|
||||
title: ADLS
|
||||
slug: /connectors/storage/adls
|
||||
---
|
||||
|
||||
{% connectorDetailsHeader
|
||||
name="ADLS"
|
||||
stage="PROD"
|
||||
platform="Collate"
|
||||
availableFeatures=["Metadata"]
|
||||
unavailableFeatures=[]
|
||||
/ %}
|
||||
|
||||
This page contains the setup guide and reference information for the ADLS connector.
|
||||
|
||||
Configure and schedule ADLS metadata workflows from the OpenMetadata UI:
|
||||
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
|
||||
{% partial file="/v1.3/connectors/ingestion-modes-tiles.md" variables={yamlPath: "/connectors/storage/adls/yaml"} /%}
|
||||
|
||||
## Requirements
|
||||
|
||||
We need the following permissions in Azure Data Lake Storage:
|
||||
|
||||
### ADLS Permissions
|
||||
|
||||
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following permissions on the Storage Account:
|
||||
- Storage Blob Data Contributor
|
||||
- Storage Queue Data Contributor
|
||||
|
||||
|
||||
### OpenMetadata Manifest
|
||||
|
||||
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
|
||||
metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
|
||||
file at the bucket root.
|
||||
|
||||
You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
|
||||
|
||||
{% partial file="/v1.3/connectors/storage/manifest.md" /%}
|
||||
|
||||
## Metadata Ingestion
|
||||
|
||||
{% stepsContainer %}
|
||||
|
||||
{% step srNumber=1 %}
|
||||
|
||||
{% stepDescription title="1. Visit the Services Page" %}
|
||||
|
||||
The first step is ingesting the metadata from your sources. Under
|
||||
Settings, you will find a Services link an external source system to
|
||||
OpenMetadata. Once a service is created, it can be used to configure
|
||||
metadata, usage, and profiler workflows.
|
||||
|
||||
To visit the Services page, select Services from the Settings menu.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/visit-services.png"
|
||||
alt="Visit Services Page"
|
||||
caption="Find Dashboard option on left panel of the settings page" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% step srNumber=2 %}
|
||||
|
||||
{% stepDescription title="2. Create a New Service" %}
|
||||
|
||||
Click on the 'Add New Service' button to start the Service creation.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/create-service.png"
|
||||
alt="Create a new service"
|
||||
caption="Add a new Service from the Storage Services page" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% step srNumber=3 %}
|
||||
|
||||
{% stepDescription title="3. Select the Service Type" %}
|
||||
|
||||
Select ADLS as the service type and click Next.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/adls/select-service.png"
|
||||
alt="Select Service"
|
||||
caption="Select your service from the list" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% step srNumber=4 %}
|
||||
|
||||
{% stepDescription title="4. Name and Describe your Service" %}
|
||||
|
||||
Provide a name and description for your service.
|
||||
|
||||
#### Service Name
|
||||
|
||||
OpenMetadata uniquely identifies services by their Service Name. Provide
|
||||
a name that distinguishes your deployment from other services, including
|
||||
the other Storage services that you might be ingesting metadata
|
||||
from.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/adls/add-new-service.png"
|
||||
alt="Add New Service"
|
||||
caption="Provide a Name and description for your Service" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% step srNumber=5 %}
|
||||
|
||||
{% stepDescription title="5. Configure the Service Connection" %}
|
||||
|
||||
In this step, we will configure the connection settings required for
|
||||
this connector. Please follow the instructions below to ensure that
|
||||
you've configured the connector to read from your ADLS service as
|
||||
desired.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/adls/service-connection.png"
|
||||
alt="Configure service connection"
|
||||
caption="Configure the service connection by filling the form" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% extraContent parentTagName="stepsContainer" %}
|
||||
|
||||
#### Connection Details
|
||||
|
||||
**Client ID**: This unique identifier is assigned to your Azure Service Principal App, serving as a key for authentication and authorization.
|
||||
|
||||
**Client Secret**: This confidential password is associated with the Service Principal, safeguarding access to Azure resources and ensuring secure communication.
|
||||
|
||||
**Tenant ID**: Identifying your Azure Subscription, the Tenant ID links your resources to a specific organization or account within the Azure Active Directory.
|
||||
|
||||
**Storage Account Name**: This is the user-defined name for your Azure Storage Account, providing a globally unique namespace for your data.
|
||||
|
||||
**Key Vault Name**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
|
||||
|
||||
{% /extraContent %}
|
||||
|
||||
{% partial file="/v1.3/connectors/test-connection.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/storage/configure-ingestion.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/ingestion-schedule-and-deploy.md" /%}
|
||||
|
||||
{% /stepsContainer %}
|
||||
|
||||
{% partial file="/v1.3/connectors/troubleshooting.md" /%}
|
201
openmetadata-docs/content/v1.3.x/connectors/storage/adls/yaml.md
Normal file
201
openmetadata-docs/content/v1.3.x/connectors/storage/adls/yaml.md
Normal file
@ -0,0 +1,201 @@
|
||||
---
|
||||
title: Run the Azure Connector Externally
|
||||
slug: /connectors/storage/azure/yaml
|
||||
---
|
||||
|
||||
{% connectorDetailsHeader
|
||||
name="Azure"
|
||||
stage="PROD"
|
||||
platform="Collate"
|
||||
availableFeatures=["Metadata"]
|
||||
unavailableFeatures=[]
|
||||
/ %}
|
||||
|
||||
This page contains the setup guide and reference information for the Azure connector.
|
||||
|
||||
Configure and schedule Azure metadata workflows from the CLI:
|
||||
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
|
||||
{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
|
||||
|
||||
## Requirements
|
||||
|
||||
{%inlineCallout icon="description" bold="OpenMetadata 1.0 or later" href="/deployment"%}
|
||||
To deploy OpenMetadata, check the Deployment guides.
|
||||
{%/inlineCallout%}
|
||||
|
||||
To run the metadata ingestion, we need the following permissions in ADLS:
|
||||
|
||||
### ADLS Permissions
|
||||
|
||||
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following permissions on the Storage Account:
|
||||
- Storage Blob Data Contributor
|
||||
- Storage Queue Data Contributor
|
||||
|
||||
### OpenMetadata Manifest
|
||||
|
||||
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
|
||||
metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
|
||||
file at the bucket root.
|
||||
|
||||
You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
|
||||
|
||||
{% partial file="/v1.3/connectors/storage/manifest.md" /%}
|
||||
|
||||
## Metadata Ingestion
|
||||
|
||||
All connectors are defined as JSON Schemas.
|
||||
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/storage/adlsConnection.json)
|
||||
you can find the structure to create a connection to Athena.
|
||||
|
||||
In order to create and run a Metadata Ingestion workflow, we will follow
|
||||
the steps to create a YAML configuration able to connect to the source,
|
||||
process the Entities if needed, and reach the OpenMetadata server.
|
||||
|
||||
The workflow is modeled around the following
|
||||
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json)
|
||||
|
||||
### 1. Define the YAML Config
|
||||
|
||||
This is a sample config for Athena:
|
||||
|
||||
{% codePreview %}
|
||||
|
||||
{% codeInfoContainer %}
|
||||
|
||||
#### Source Configuration - Service Connection
|
||||
|
||||
{% codeInfo srNumber=1 %}
|
||||
- **Client ID**: This is the unique identifier for your application registered in Azure AD. It’s used in conjunction with the Client Secret to authenticate your application.
|
||||
{% /codeInfo %}
|
||||
|
||||
{% codeInfo srNumber=2 %}
|
||||
- **Client Secret**: A key that your application uses, along with the Client ID, to access Azure resources.
|
||||
|
||||
1. Log into [Microsoft Azure](https://ms.portal.azure.com/#allservices).
|
||||
2. Search for `App registrations` and select the `App registrations link`.
|
||||
3. Select the `Azure AD` app you're using for this connection.
|
||||
4. Under `Manage`, select `Certificates & secrets`.
|
||||
5. Under `Client secrets`, select `New client secret`.
|
||||
6. In the `Add a client secret` pop-up window, provide a description for your application secret. Choose when the application should expire, and select `Add`.
|
||||
7. From the `Client secrets` section, copy the string in the `Value` column of the newly created application secret.
|
||||
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
{% codeInfo srNumber=3 %}
|
||||
- **Tenant ID**: The unique identifier of the Azure AD instance under which your account and application are registered.
|
||||
|
||||
To get the tenant ID, follow these steps:
|
||||
|
||||
1. Log into [Microsoft Azure](https://ms.portal.azure.com/#allservices).
|
||||
2. Search for `App registrations` and select the `App registrations link`.
|
||||
3. Select the `Azure AD` app you're using for Power BI.
|
||||
4. From the `Overview` section, copy the `Directory (tenant) ID`.
|
||||
{% /codeInfo %}
|
||||
|
||||
{% codeInfo srNumber=4 %}
|
||||
- **Account Name**: The name of your ADLS account.
|
||||
|
||||
Here are the step-by-step instructions for finding the account name for an Azure Data Lake Storage account:
|
||||
|
||||
1. Sign in to the Azure portal and navigate to the `Storage accounts` page.
|
||||
2. Find the Data Lake Storage account you want to access and click on its name.
|
||||
3. In the account overview page, locate the `Account name` field. This is the unique identifier for the Data Lake Storage account.
|
||||
4. You can use this account name to access and manage the resources associated with the account, such as creating and managing containers and directories.
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
{% codeInfo srNumber=5 %}
|
||||
- **Key Vault**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/storage/source-config-def.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/ingestion-sink-def.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/workflow-config-def.md" /%}
|
||||
|
||||
#### Advanced Configuration
|
||||
|
||||
{% codeInfo srNumber=6 %}
|
||||
|
||||
**Connection Options (Optional)**: Enter the details for any additional connection options that can be sent to Athena during the connection. These details must be added as Key-Value pairs.
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
{% codeInfo srNumber=7 %}
|
||||
|
||||
**Connection Arguments (Optional)**: Enter the details for any additional connection arguments such as security or protocol configs that can be sent to Athena during the connection. These details must be added as Key-Value pairs.
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
{% /codeInfoContainer %}
|
||||
|
||||
{% codeBlock fileName="filename.yaml" %}
|
||||
|
||||
```yaml
|
||||
source:
|
||||
type: ADLS
|
||||
serviceName: local_adls
|
||||
serviceConnection:
|
||||
config:
|
||||
type: ADLS
|
||||
credentials:
|
||||
```
|
||||
```yaml {% srNumber=1 %}
|
||||
clientId: client-id
|
||||
|
||||
```
|
||||
```yaml {% srNumber=2 %}
|
||||
clientSecret: client-secret
|
||||
|
||||
```
|
||||
```yaml {% srNumber=3 %}
|
||||
tenantId: tenant-id
|
||||
|
||||
```
|
||||
```yaml {% srNumber=4 %}
|
||||
accountName: account-name
|
||||
```
|
||||
```yaml {% srNumber=5 %}
|
||||
vaultName: vault-name
|
||||
```
|
||||
```yaml {% srNumber=6 %}
|
||||
# connectionOptions:
|
||||
# key: value
|
||||
```
|
||||
```yaml {% srNumber=7 %}
|
||||
# connectionArguments:
|
||||
# key: value
|
||||
```
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/storage/source-config.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/ingestion-sink.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/workflow-config.md" /%}
|
||||
|
||||
{% /codeBlock %}
|
||||
|
||||
{% /codePreview %}
|
||||
|
||||
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/ingestion-cli.md" /%}
|
||||
|
||||
## Related
|
||||
|
||||
{% tilesContainer %}
|
||||
|
||||
{% tile
|
||||
icon="mediation"
|
||||
title="Configure Ingestion Externally"
|
||||
description="Deploy, configure, and manage the ingestion workflows externally."
|
||||
link="/deployment/ingestion"
|
||||
/ %}
|
||||
|
||||
{% /tilesContainer %}
|
175
openmetadata-docs/content/v1.3.x/connectors/storage/gcs/index.md
Normal file
175
openmetadata-docs/content/v1.3.x/connectors/storage/gcs/index.md
Normal file
@ -0,0 +1,175 @@
|
||||
---
|
||||
title: GCS
|
||||
slug: /connectors/storage/gcs
|
||||
---
|
||||
|
||||
{% connectorDetailsHeader
|
||||
name="GCS"
|
||||
stage="PROD"
|
||||
platform="Collate"
|
||||
availableFeatures=["Metadata"]
|
||||
unavailableFeatures=[]
|
||||
/ %}
|
||||
|
||||
This page contains the setup guide and reference information for the GCS connector.
|
||||
|
||||
Configure and schedule GCS metadata workflows from the OpenMetadata UI:
|
||||
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
|
||||
{% partial file="/v1.3/connectors/ingestion-modes-tiles.md" variables={yamlPath: "/connectors/storage/gcs/yaml"} /%}
|
||||
|
||||
## Requirements
|
||||
|
||||
We need the following permissions in GCP:
|
||||
|
||||
### GCS Permissions
|
||||
|
||||
For all the buckets that we want to ingest, we need to provide the following:
|
||||
- `storage.buckets.get`
|
||||
- `storage.buckets.list`
|
||||
- `storage.objects.get`
|
||||
- `storage.objects.list`
|
||||
|
||||
|
||||
### OpenMetadata Manifest
|
||||
|
||||
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
|
||||
metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
|
||||
file at the bucket root.
|
||||
|
||||
You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
|
||||
|
||||
{% partial file="/v1.3/connectors/storage/manifest.md" /%}
|
||||
|
||||
## Metadata Ingestion
|
||||
|
||||
{% stepsContainer %}
|
||||
|
||||
{% step srNumber=1 %}
|
||||
|
||||
{% stepDescription title="1. Visit the Services Page" %}
|
||||
|
||||
The first step is ingesting the metadata from your sources. Under
|
||||
Settings, you will find a Services link an external source system to
|
||||
OpenMetadata. Once a service is created, it can be used to configure
|
||||
metadata, usage, and profiler workflows.
|
||||
|
||||
To visit the Services page, select Services from the Settings menu.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/visit-services.png"
|
||||
alt="Visit Services Page"
|
||||
caption="Find Dashboard option on left panel of the settings page" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% step srNumber=2 %}
|
||||
|
||||
{% stepDescription title="2. Create a New Service" %}
|
||||
|
||||
Click on the 'Add New Service' button to start the Service creation.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/create-service.png"
|
||||
alt="Create a new service"
|
||||
caption="Add a new Service from the Storage Services page" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% step srNumber=3 %}
|
||||
|
||||
{% stepDescription title="3. Select the Service Type" %}
|
||||
|
||||
Select GCS as the service type and click Next.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/GCS/select-service.png"
|
||||
alt="Select Service"
|
||||
caption="Select your service from the list" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% step srNumber=4 %}
|
||||
|
||||
{% stepDescription title="4. Name and Describe your Service" %}
|
||||
|
||||
Provide a name and description for your service.
|
||||
|
||||
#### Service Name
|
||||
|
||||
OpenMetadata uniquely identifies services by their Service Name. Provide
|
||||
a name that distinguishes your deployment from other services, including
|
||||
the other Storage services that you might be ingesting metadata
|
||||
from.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/GCS/add-new-service.png"
|
||||
alt="Add New Service"
|
||||
caption="Provide a Name and description for your Service" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% step srNumber=5 %}
|
||||
|
||||
{% stepDescription title="5. Configure the Service Connection" %}
|
||||
|
||||
In this step, we will configure the connection settings required for
|
||||
this connector. Please follow the instructions below to ensure that
|
||||
you've configured the connector to read from your GCS service as
|
||||
desired.
|
||||
|
||||
{% /stepDescription %}
|
||||
|
||||
{% stepVisualInfo %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.3/connectors/GCS/service-connection.png"
|
||||
alt="Configure service connection"
|
||||
caption="Configure the service connection by filling the form" /%}
|
||||
|
||||
{% /stepVisualInfo %}
|
||||
|
||||
{% /step %}
|
||||
|
||||
{% extraContent parentTagName="stepsContainer" %}
|
||||
|
||||
#### Connection Details
|
||||
|
||||
|
||||
{% /extraContent %}
|
||||
|
||||
{% partial file="/v1.3/connectors/test-connection.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/storage/configure-ingestion.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/ingestion-schedule-and-deploy.md" /%}
|
||||
|
||||
{% /stepsContainer %}
|
||||
|
||||
{% partial file="/v1.3/connectors/troubleshooting.md" /%}
|
189
openmetadata-docs/content/v1.3.x/connectors/storage/gcs/yaml.md
Normal file
189
openmetadata-docs/content/v1.3.x/connectors/storage/gcs/yaml.md
Normal file
@ -0,0 +1,189 @@
|
||||
---
|
||||
title: Run the GCS Connector Externally
|
||||
slug: /connectors/storage/gcs/yaml
|
||||
---
|
||||
|
||||
{% connectorDetailsHeader
|
||||
name="GCS"
|
||||
stage="PROD"
|
||||
platform="Collate"
|
||||
availableFeatures=["Metadata"]
|
||||
unavailableFeatures=[]
|
||||
/ %}
|
||||
|
||||
This page contains the setup guide and reference information for the GCS connector.
|
||||
|
||||
Configure and schedule GCS metadata workflows from the CLI:
|
||||
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
|
||||
{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
|
||||
|
||||
## Requirements
|
||||
|
||||
{%inlineCallout icon="description" bold="OpenMetadata 1.0 or later" href="/deployment"%}
|
||||
To deploy OpenMetadata, check the Deployment guides.
|
||||
{%/inlineCallout%}
|
||||
|
||||
We need the following permissions in GCP:
|
||||
|
||||
### GCS Permissions
|
||||
|
||||
For all the buckets that we want to ingest, we need to provide the following:
|
||||
- `storage.buckets.get`
|
||||
- `storage.buckets.list`
|
||||
- `storage.objects.get`
|
||||
- `storage.objects.list`
|
||||
|
||||
### OpenMetadata Manifest
|
||||
|
||||
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
|
||||
metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
|
||||
file at the bucket root.
|
||||
|
||||
You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
|
||||
|
||||
{% partial file="/v1.3/connectors/storage/manifest.md" /%}
|
||||
|
||||
## Metadata Ingestion
|
||||
|
||||
All connectors are defined as JSON Schemas.
|
||||
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/storage/GCSConnection.json)
|
||||
you can find the structure to create a connection to Athena.
|
||||
|
||||
In order to create and run a Metadata Ingestion workflow, we will follow
|
||||
the steps to create a YAML configuration able to connect to the source,
|
||||
process the Entities if needed, and reach the OpenMetadata server.
|
||||
|
||||
The workflow is modeled around the following
|
||||
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json)
|
||||
|
||||
### 1. Define the YAML Config
|
||||
|
||||
This is a sample config for Athena:
|
||||
|
||||
{% codePreview %}
|
||||
|
||||
{% codeInfoContainer %}
|
||||
|
||||
#### Source Configuration - Service Connection
|
||||
|
||||
**gcpConfig:**
|
||||
|
||||
**1.** Passing the raw credential values provided by GCP. This requires us to provide the following information, all provided by GCP:
|
||||
|
||||
- **type**: Credentials Type is the type of the account, for a service account the value of this field is `service_account`. To fetch this key, look for the value associated with the `type` key in the service account key file.
|
||||
- **projectId**: A project ID is a unique string used to differentiate your project from all others in Google Cloud. To fetch this key, look for the value associated with the `project_id` key in the service account key file. You can also pass multiple project id to ingest metadata from different GCP projects into one service.
|
||||
- **privateKeyId**: This is a unique identifier for the private key associated with the service account. To fetch this key, look for the value associated with the `private_key_id` key in the service account file.
|
||||
- **privateKey**: This is the private key associated with the service account that is used to authenticate and authorize access to GCP. To fetch this key, look for the value associated with the `private_key` key in the service account file.
|
||||
- **clientEmail**: This is the email address associated with the service account. To fetch this key, look for the value associated with the `client_email` key in the service account key file.
|
||||
- **clientId**: This is a unique identifier for the service account. To fetch this key, look for the value associated with the `client_id` key in the service account key file.
|
||||
- **authUri**: This is the URI for the authorization server. To fetch this key, look for the value associated with the `auth_uri` key in the service account key file. The default value to Auth URI is https://accounts.google.com/o/oauth2/auth.
|
||||
- **tokenUri**: The Google Cloud Token URI is a specific endpoint used to obtain an OAuth 2.0 access token from the Google Cloud IAM service. This token allows you to authenticate and access various Google Cloud resources and APIs that require authorization. To fetch this key, look for the value associated with the `token_uri` key in the service account credentials file. Default Value to Token URI is https://oauth2.googleapis.com/token.
|
||||
- **authProviderX509CertUrl**: This is the URL of the certificate that verifies the authenticity of the authorization server. To fetch this key, look for the value associated with the `auth_provider_x509_cert_url` key in the service account key file. The Default value for Auth Provider X509Cert URL is https://www.googleapis.com/oauth2/v1/certs
|
||||
- **clientX509CertUrl**: This is the URL of the certificate that verifies the authenticity of the service account. To fetch this key, look for the value associated with the `client_x509_cert_url` key in the service account key file.
|
||||
|
||||
**2.** Passing a local file path that contains the credentials:
|
||||
- **gcpCredentialsPath**
|
||||
|
||||
- If you prefer to pass the credentials file, you can do so as follows:
|
||||
```yaml
|
||||
source:
|
||||
type: gcs
|
||||
serviceName: local_gcs
|
||||
serviceConnection:
|
||||
config:
|
||||
type: GCS
|
||||
credentials:
|
||||
gcpConfig: <path to file>
|
||||
```
|
||||
|
||||
- If you want to use [ADC authentication](https://cloud.google.com/docs/authentication#adc) for GCP you can just leave
|
||||
the GCP credentials empty. This is why they are not marked as required.
|
||||
|
||||
```yaml
|
||||
...
|
||||
source:
|
||||
type: gcs
|
||||
serviceName: local_gcs
|
||||
serviceConnection:
|
||||
config:
|
||||
type: GCS
|
||||
credentials:
|
||||
gcpConfig: {}
|
||||
...
|
||||
```
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/database/source-config-def.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/ingestion-sink-def.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/workflow-config-def.md" /%}
|
||||
|
||||
#### Advanced Configuration
|
||||
|
||||
{% codeInfo srNumber=2 %}
|
||||
|
||||
**Connection Options (Optional)**: Enter the details for any additional connection options that can be sent to Athena during the connection. These details must be added as Key-Value pairs.
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
{% codeInfo srNumber=3 %}
|
||||
|
||||
**Connection Arguments (Optional)**: Enter the details for any additional connection arguments such as security or protocol configs that can be sent to Athena during the connection. These details must be added as Key-Value pairs.
|
||||
|
||||
{% /codeInfo %}
|
||||
|
||||
{% /codeInfoContainer %}
|
||||
|
||||
{% codeBlock fileName="filename.yaml" %}
|
||||
|
||||
```yaml
|
||||
source:
|
||||
type: gcs
|
||||
serviceName: "<service name>"
|
||||
serviceConnection:
|
||||
config:
|
||||
type: GCS
|
||||
```
|
||||
```yaml {% srNumber=1 %}
|
||||
credentials:
|
||||
gcpConfig:
|
||||
type: My Type
|
||||
projectId: project ID # ["project-id-1", "project-id-2"]
|
||||
privateKeyId: us-east-2
|
||||
privateKey: |
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
Super secret key
|
||||
-----END PRIVATE KEY-----
|
||||
clientEmail: client@mail.com
|
||||
clientId: 1234
|
||||
# authUri: https://accounts.google.com/o/oauth2/auth (default)
|
||||
# tokenUri: https://oauth2.googleapis.com/token (default)
|
||||
# authProviderX509CertUrl: https://www.googleapis.com/oauth2/v1/certs (default)
|
||||
clientX509CertUrl: https://cert.url
|
||||
# taxonomyLocation: us
|
||||
# taxonomyProjectID: ["project-id-1", "project-id-2"]
|
||||
# usageLocation: us
|
||||
```
|
||||
```yaml {% srNumber=2 %}
|
||||
# connectionOptions:
|
||||
# key: value
|
||||
```
|
||||
```yaml {% srNumber=3 %}
|
||||
# connectionArguments:
|
||||
# key: value
|
||||
```
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/database/source-config.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/ingestion-sink.md" /%}
|
||||
|
||||
{% partial file="/v1.3/connectors/yaml/workflow-config.md" /%}
|
||||
|
||||
{% /codeBlock %}
|
||||
|
||||
{% /codePreview %}
|
@ -10,6 +10,8 @@ This is the supported list of connectors for Storage Services:
|
||||
{% connectorsListContainer %}
|
||||
|
||||
{% connectorInfoCard name="S3" stage="PROD" href="/connectors/storage/s3" platform="OpenMetadata" / %}
|
||||
{% connectorInfoCard name="Azure" stage="PROD" href="/connectors/storage/azure" platform="Collate" / %}
|
||||
{% connectorInfoCard name="GCS" stage="PROD" href="/connectors/storage/gcs" platform="Collate" / %}
|
||||
|
||||
{% /connectorsListContainer %}
|
||||
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 110 KiB |
BIN
openmetadata-docs/images/v1.3/connectors/adls/select-service.png
Normal file
BIN
openmetadata-docs/images/v1.3/connectors/adls/select-service.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 104 KiB |
Binary file not shown.
After Width: | Height: | Size: 322 KiB |
BIN
openmetadata-docs/images/v1.3/connectors/gcs/add-new-service.png
Normal file
BIN
openmetadata-docs/images/v1.3/connectors/gcs/add-new-service.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 109 KiB |
BIN
openmetadata-docs/images/v1.3/connectors/gcs/select-service.png
Normal file
BIN
openmetadata-docs/images/v1.3/connectors/gcs/select-service.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 106 KiB |
Binary file not shown.
After Width: | Height: | Size: 153 KiB |
@ -0,0 +1,70 @@
|
||||
# ADLS
|
||||
|
||||
In this section, we provide guides and references to use the ADLS connector.
|
||||
|
||||
By default, the ADLS connector will ingest only top-level containers (Buckets). If you want to extract any information from within and their data models, you can follow the [docs](https://docs.open-metadata.org/connectors/storage).
|
||||
|
||||
## Requirements
|
||||
|
||||
We need the following permissions in AWS:
|
||||
|
||||
### ADLS Permissions
|
||||
|
||||
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following permissions on the Storage Account:
|
||||
- Storage Blob Data Contributor
|
||||
- Storage Queue Data Contributor
|
||||
|
||||
You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/storage/adls).
|
||||
|
||||
## Connection Details
|
||||
|
||||
$$section
|
||||
### Client ID $(id="clientId")
|
||||
|
||||
This is a unique identifier for the service account. To fetch this key, look for the value associated with the `client_id` key in the service account key file.
|
||||
$$
|
||||
|
||||
|
||||
$$section
|
||||
### Client Secret $(id="clientSecret")
|
||||
To get the client secret, follow these steps:
|
||||
|
||||
1. Log into [Microsoft Azure](https://ms.portal.azure.com/#allservices).
|
||||
2. Search for `App registrations` and select the `App registrations link`.
|
||||
3. Select the `Azure AD` app you're using for this connection.
|
||||
4. Under `Manage`, select `Certificates & secrets`.
|
||||
5. Under `Client secrets`, select `New client secret`.
|
||||
6. In the `Add a client secret` pop-up window, provide a description for your application secret. Choose when the application should expire, and select `Add`.
|
||||
7. From the `Client secrets` section, copy the string in the `Value` column of the newly created application secret.
|
||||
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Tenant ID $(id="tenantId")
|
||||
|
||||
To get the tenant ID, follow these steps:
|
||||
|
||||
1. Log into [Microsoft Azure](https://ms.portal.azure.com/#allservices).
|
||||
2. Search for `App registrations` and select the `App registrations link`.
|
||||
3. Select the `Azure AD` app you're using for Power BI.
|
||||
4. From the `Overview` section, copy the `Directory (tenant) ID`.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Account Name $(id="accountName")
|
||||
|
||||
Here are the step-by-step instructions for finding the account name for an Azure Data Lake Storage account:
|
||||
|
||||
1. Sign in to the Azure portal and navigate to the `Storage accounts` page.
|
||||
2. Find the Data Lake Storage account you want to access and click on its name.
|
||||
3. In the account overview page, locate the `Account name` field. This is the unique identifier for the Data Lake Storage account.
|
||||
4. You can use this account name to access and manage the resources associated with the account, such as creating and managing containers and directories.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Key Vault Name $(id="vaultName")
|
||||
|
||||
Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
|
||||
|
||||
$$
|
||||
|
@ -0,0 +1,111 @@
|
||||
# GCS
|
||||
|
||||
In this section, we provide guides and references to use the GCS connector.
|
||||
|
||||
By default, the GCS connector will ingest only top-level containers (Buckets). If you want to extract any information from within and their data models, you can follow the [docs](https://docs.open-metadata.org/connectors/storage).
|
||||
|
||||
## Requirements
|
||||
|
||||
We need the following permissions in GCS:
|
||||
|
||||
### GCS Permissions
|
||||
|
||||
For all the buckets that we want to ingest, we need to provide the following:
|
||||
- `storage.buckets.get`
|
||||
- `storage.buckets.list`
|
||||
- `storage.objects.get`
|
||||
- `storage.objects.list`
|
||||
|
||||
## Connection Details
|
||||
|
||||
$$section
|
||||
### GCP Credentials Configuration $(id="gcpConfig")
|
||||
|
||||
You can authenticate with your BigQuery instance using either `GCP Credentials Path` where you can specify the file path of the service account key, or you can pass the values directly by choosing the `GCP Credentials Values` from the service account key file.
|
||||
|
||||
You can check [this](https://cloud.google.com/iam/docs/keys-create-delete#iam-service-account-keys-create-console) documentation on how to create the service account keys and download it.
|
||||
|
||||
If you want to use [ADC authentication](https://cloud.google.com/docs/authentication#adc) for BigQuery you can just leave the GCP credentials empty.
|
||||
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Credentials Type $(id="type")
|
||||
|
||||
Credentials Type is the type of the account, for a service account the value of this field is `service_account`. To fetch this key, look for the value associated with the `type` key in the service account key file.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Project ID $(id="projectId")
|
||||
|
||||
A project ID is a unique string used to differentiate your project from all others in Google Cloud. To fetch this key, look for the value associated with the `project_id` key in the service account key file.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Private Key ID $(id="privateKeyId")
|
||||
|
||||
This is a unique identifier for the private key associated with the service account. To fetch this key, look for the value associated with the `private_key_id` key in the service account file.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Private Key $(id="privateKey")
|
||||
|
||||
This is the private key associated with the service account that is used to authenticate and authorize access to GCP. To fetch this key, look for the value associated with the `private_key` key in the service account file.
|
||||
|
||||
Make sure you are passing the key in a correct format. If your private key looks like this:
|
||||
|
||||
```
|
||||
-----BEGIN ENCRYPTED PRIVATE KEY-----
|
||||
MII..
|
||||
MBQ...
|
||||
CgU..
|
||||
8Lt..
|
||||
...
|
||||
h+4=
|
||||
-----END ENCRYPTED PRIVATE KEY-----
|
||||
```
|
||||
|
||||
You will have to replace new lines with `\n` and the final private key that you need to pass should look like this:
|
||||
|
||||
```
|
||||
-----BEGIN ENCRYPTED PRIVATE KEY-----\nMII..\nMBQ...\nCgU..\n8Lt..\n...\nh+4=\n-----END ENCRYPTED PRIVATE KEY-----\n
|
||||
```
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Client Email $(id="clientEmail")
|
||||
|
||||
This is the email address associated with the service account. To fetch this key, look for the value associated with the `client_email` key in the service account key file.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Client ID $(id="clientId")
|
||||
|
||||
This is a unique identifier for the service account. To fetch this key, look for the value associated with the `client_id` key in the service account key file.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Auth URI $(id="authUri")
|
||||
|
||||
This is the URI for the authorization server. To fetch this key, look for the value associated with the `auth_uri` key in the service account key file.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Token URI $(id="tokenUri")
|
||||
|
||||
The Google Cloud Token URI is a specific endpoint used to obtain an OAuth 2.0 access token from the Google Cloud IAM service. This token allows you to authenticate and access various Google Cloud resources and APIs that require authorization.
|
||||
|
||||
To fetch this key, look for the value associated with the `token_uri` key in the service account credentials file.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Auth Provider X509Cert URL $(id="authProviderX509CertUrl")
|
||||
|
||||
This is the URL of the certificate that verifies the authenticity of the authorization server. To fetch this key, look for the value associated with the `auth_provider_x509_cert_url` key in the service account key file.
|
||||
$$
|
||||
|
||||
$$section
|
||||
### Client X509Cert URL $(id="clientX509CertUrl")
|
||||
|
||||
This is the URL of the certificate that verifies the authenticity of the service account. To fetch this key, look for the value associated with the `client_x509_cert_url` key in the service account key file.
|
||||
$$
|
@ -108,6 +108,7 @@ class ServiceUtilClassBase {
|
||||
StorageServiceType.Adls,
|
||||
DatabaseServiceType.QueryLog,
|
||||
DatabaseServiceType.Dbt,
|
||||
StorageServiceType.Gcs,
|
||||
];
|
||||
|
||||
protected updateUnsupportedServices(types: string[]) {
|
||||
|
Loading…
x
Reference in New Issue
Block a user