MINOR: Add / Fix GCS and ADLS - docs, bugs (#15502)

Add GCS and ADLS docs
This commit is contained in:
Ayush Shah 2024-03-12 21:13:24 +05:30 committed by GitHub
parent 189e0b82d0
commit 1c2fbdd9f4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
18 changed files with 943 additions and 5 deletions

View File

@ -164,7 +164,8 @@ We support two ways of authenticating to GCS:
- **Client ID** : Client ID of the data storage account
- **Client Secret** : Client Secret of the account
- **Tenant ID** : Tenant ID under which the data storage account falls
- **Account Name** : Account Name of the data Storage
- **Account Name** : Account Name of the Data Storage
- **Key Vault Name**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
- **Required Roles**

View File

@ -239,7 +239,8 @@ source:
- **Client ID** : Client ID of the data storage account
- **Client Secret** : Client Secret of the account
- **Tenant ID** : Tenant ID under which the data storage account falls
- **Account Name** : Account Name of the data Storage
- **Account Name** : Account Name of the Data Storage
- **Key Vault Name**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
{% /codeInfo %}

View File

@ -183,7 +183,9 @@ Find more information about [Source Identity](https://docs.aws.amazon.com/STS/la
- **Tenant ID** : Tenant ID under which the data storage account falls
- **Account Name** : Account Name of the data Storage
- **Account Name** : Account Name of the Data Storage
- **Key Vault Name**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
{% /extraContent %}

View File

@ -268,7 +268,9 @@ source:
* **clientId** : Client ID of the data storage account
* **clientSecret** : Client Secret of the account
* **tenantId** : Tenant ID under which the data storage account falls
* **accountName** : Account Name of the data Storage
* **accountName** : Account Name of the Data Storage
* **vaultName**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
{% /codeInfo %}
@ -407,7 +409,8 @@ source:
* **clientId** : Client ID of the data storage account
* **clientSecret** : Client Secret of the account
* **tenantId** : Tenant ID under which the data storage account falls
* **accountName** : Account Name of the data Storage
* **accountName** : Account Name of the Data Storage
* **vaultName**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
{% /codeInfo %}

View File

@ -0,0 +1,182 @@
---
title: ADLS
slug: /connectors/storage/adls
---
{% connectorDetailsHeader
name="ADLS"
stage="PROD"
platform="Collate"
availableFeatures=["Metadata"]
unavailableFeatures=[]
/ %}
This page contains the setup guide and reference information for the ADLS connector.
Configure and schedule ADLS metadata workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
{% partial file="/v1.3/connectors/ingestion-modes-tiles.md" variables={yamlPath: "/connectors/storage/adls/yaml"} /%}
## Requirements
We need the following permissions in Azure Data Lake Storage:
### ADLS Permissions
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following permissions on the Storage Account:
- Storage Blob Data Contributor
- Storage Queue Data Contributor
### OpenMetadata Manifest
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
file at the bucket root.
You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
{% partial file="/v1.3/connectors/storage/manifest.md" /%}
## Metadata Ingestion
{% stepsContainer %}
{% step srNumber=1 %}
{% stepDescription title="1. Visit the Services Page" %}
The first step is ingesting the metadata from your sources. Under
Settings, you will find a Services link an external source system to
OpenMetadata. Once a service is created, it can be used to configure
metadata, usage, and profiler workflows.
To visit the Services page, select Services from the Settings menu.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/visit-services.png"
alt="Visit Services Page"
caption="Find Dashboard option on left panel of the settings page" /%}
{% /stepVisualInfo %}
{% /step %}
{% step srNumber=2 %}
{% stepDescription title="2. Create a New Service" %}
Click on the 'Add New Service' button to start the Service creation.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/create-service.png"
alt="Create a new service"
caption="Add a new Service from the Storage Services page" /%}
{% /stepVisualInfo %}
{% /step %}
{% step srNumber=3 %}
{% stepDescription title="3. Select the Service Type" %}
Select ADLS as the service type and click Next.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/adls/select-service.png"
alt="Select Service"
caption="Select your service from the list" /%}
{% /stepVisualInfo %}
{% /step %}
{% step srNumber=4 %}
{% stepDescription title="4. Name and Describe your Service" %}
Provide a name and description for your service.
#### Service Name
OpenMetadata uniquely identifies services by their Service Name. Provide
a name that distinguishes your deployment from other services, including
the other Storage services that you might be ingesting metadata
from.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/adls/add-new-service.png"
alt="Add New Service"
caption="Provide a Name and description for your Service" /%}
{% /stepVisualInfo %}
{% /step %}
{% step srNumber=5 %}
{% stepDescription title="5. Configure the Service Connection" %}
In this step, we will configure the connection settings required for
this connector. Please follow the instructions below to ensure that
you've configured the connector to read from your ADLS service as
desired.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/adls/service-connection.png"
alt="Configure service connection"
caption="Configure the service connection by filling the form" /%}
{% /stepVisualInfo %}
{% /step %}
{% extraContent parentTagName="stepsContainer" %}
#### Connection Details
**Client ID**: This unique identifier is assigned to your Azure Service Principal App, serving as a key for authentication and authorization.
**Client Secret**: This confidential password is associated with the Service Principal, safeguarding access to Azure resources and ensuring secure communication.
**Tenant ID**: Identifying your Azure Subscription, the Tenant ID links your resources to a specific organization or account within the Azure Active Directory.
**Storage Account Name**: This is the user-defined name for your Azure Storage Account, providing a globally unique namespace for your data.
**Key Vault Name**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
{% /extraContent %}
{% partial file="/v1.3/connectors/test-connection.md" /%}
{% partial file="/v1.3/connectors/storage/configure-ingestion.md" /%}
{% partial file="/v1.3/connectors/ingestion-schedule-and-deploy.md" /%}
{% /stepsContainer %}
{% partial file="/v1.3/connectors/troubleshooting.md" /%}

View File

@ -0,0 +1,201 @@
---
title: Run the Azure Connector Externally
slug: /connectors/storage/azure/yaml
---
{% connectorDetailsHeader
name="Azure"
stage="PROD"
platform="Collate"
availableFeatures=["Metadata"]
unavailableFeatures=[]
/ %}
This page contains the setup guide and reference information for the Azure connector.
Configure and schedule Azure metadata workflows from the CLI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
## Requirements
{%inlineCallout icon="description" bold="OpenMetadata 1.0 or later" href="/deployment"%}
To deploy OpenMetadata, check the Deployment guides.
{%/inlineCallout%}
To run the metadata ingestion, we need the following permissions in ADLS:
### ADLS Permissions
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following permissions on the Storage Account:
- Storage Blob Data Contributor
- Storage Queue Data Contributor
### OpenMetadata Manifest
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
file at the bucket root.
You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
{% partial file="/v1.3/connectors/storage/manifest.md" /%}
## Metadata Ingestion
All connectors are defined as JSON Schemas.
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/storage/adlsConnection.json)
you can find the structure to create a connection to Athena.
In order to create and run a Metadata Ingestion workflow, we will follow
the steps to create a YAML configuration able to connect to the source,
process the Entities if needed, and reach the OpenMetadata server.
The workflow is modeled around the following
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json)
### 1. Define the YAML Config
This is a sample config for Athena:
{% codePreview %}
{% codeInfoContainer %}
#### Source Configuration - Service Connection
{% codeInfo srNumber=1 %}
- **Client ID**: This is the unique identifier for your application registered in Azure AD. Its used in conjunction with the Client Secret to authenticate your application.
{% /codeInfo %}
{% codeInfo srNumber=2 %}
- **Client Secret**: A key that your application uses, along with the Client ID, to access Azure resources.
1. Log into [Microsoft Azure](https://ms.portal.azure.com/#allservices).
2. Search for `App registrations` and select the `App registrations link`.
3. Select the `Azure AD` app you're using for this connection.
4. Under `Manage`, select `Certificates & secrets`.
5. Under `Client secrets`, select `New client secret`.
6. In the `Add a client secret` pop-up window, provide a description for your application secret. Choose when the application should expire, and select `Add`.
7. From the `Client secrets` section, copy the string in the `Value` column of the newly created application secret.
{% /codeInfo %}
{% codeInfo srNumber=3 %}
- **Tenant ID**: The unique identifier of the Azure AD instance under which your account and application are registered.
To get the tenant ID, follow these steps:
1. Log into [Microsoft Azure](https://ms.portal.azure.com/#allservices).
2. Search for `App registrations` and select the `App registrations link`.
3. Select the `Azure AD` app you're using for Power BI.
4. From the `Overview` section, copy the `Directory (tenant) ID`.
{% /codeInfo %}
{% codeInfo srNumber=4 %}
- **Account Name**: The name of your ADLS account.
Here are the step-by-step instructions for finding the account name for an Azure Data Lake Storage account:
1. Sign in to the Azure portal and navigate to the `Storage accounts` page.
2. Find the Data Lake Storage account you want to access and click on its name.
3. In the account overview page, locate the `Account name` field. This is the unique identifier for the Data Lake Storage account.
4. You can use this account name to access and manage the resources associated with the account, such as creating and managing containers and directories.
{% /codeInfo %}
{% codeInfo srNumber=5 %}
- **Key Vault**: Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
{% /codeInfo %}
{% partial file="/v1.3/connectors/yaml/storage/source-config-def.md" /%}
{% partial file="/v1.3/connectors/yaml/ingestion-sink-def.md" /%}
{% partial file="/v1.3/connectors/yaml/workflow-config-def.md" /%}
#### Advanced Configuration
{% codeInfo srNumber=6 %}
**Connection Options (Optional)**: Enter the details for any additional connection options that can be sent to Athena during the connection. These details must be added as Key-Value pairs.
{% /codeInfo %}
{% codeInfo srNumber=7 %}
**Connection Arguments (Optional)**: Enter the details for any additional connection arguments such as security or protocol configs that can be sent to Athena during the connection. These details must be added as Key-Value pairs.
{% /codeInfo %}
{% /codeInfoContainer %}
{% codeBlock fileName="filename.yaml" %}
```yaml
source:
type: ADLS
serviceName: local_adls
serviceConnection:
config:
type: ADLS
credentials:
```
```yaml {% srNumber=1 %}
clientId: client-id
```
```yaml {% srNumber=2 %}
clientSecret: client-secret
```
```yaml {% srNumber=3 %}
tenantId: tenant-id
```
```yaml {% srNumber=4 %}
accountName: account-name
```
```yaml {% srNumber=5 %}
vaultName: vault-name
```
```yaml {% srNumber=6 %}
# connectionOptions:
# key: value
```
```yaml {% srNumber=7 %}
# connectionArguments:
# key: value
```
{% partial file="/v1.3/connectors/yaml/storage/source-config.md" /%}
{% partial file="/v1.3/connectors/yaml/ingestion-sink.md" /%}
{% partial file="/v1.3/connectors/yaml/workflow-config.md" /%}
{% /codeBlock %}
{% /codePreview %}
{% partial file="/v1.3/connectors/yaml/ingestion-cli.md" /%}
## Related
{% tilesContainer %}
{% tile
icon="mediation"
title="Configure Ingestion Externally"
description="Deploy, configure, and manage the ingestion workflows externally."
link="/deployment/ingestion"
/ %}
{% /tilesContainer %}

View File

@ -0,0 +1,175 @@
---
title: GCS
slug: /connectors/storage/gcs
---
{% connectorDetailsHeader
name="GCS"
stage="PROD"
platform="Collate"
availableFeatures=["Metadata"]
unavailableFeatures=[]
/ %}
This page contains the setup guide and reference information for the GCS connector.
Configure and schedule GCS metadata workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
{% partial file="/v1.3/connectors/ingestion-modes-tiles.md" variables={yamlPath: "/connectors/storage/gcs/yaml"} /%}
## Requirements
We need the following permissions in GCP:
### GCS Permissions
For all the buckets that we want to ingest, we need to provide the following:
- `storage.buckets.get`
- `storage.buckets.list`
- `storage.objects.get`
- `storage.objects.list`
### OpenMetadata Manifest
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
file at the bucket root.
You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
{% partial file="/v1.3/connectors/storage/manifest.md" /%}
## Metadata Ingestion
{% stepsContainer %}
{% step srNumber=1 %}
{% stepDescription title="1. Visit the Services Page" %}
The first step is ingesting the metadata from your sources. Under
Settings, you will find a Services link an external source system to
OpenMetadata. Once a service is created, it can be used to configure
metadata, usage, and profiler workflows.
To visit the Services page, select Services from the Settings menu.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/visit-services.png"
alt="Visit Services Page"
caption="Find Dashboard option on left panel of the settings page" /%}
{% /stepVisualInfo %}
{% /step %}
{% step srNumber=2 %}
{% stepDescription title="2. Create a New Service" %}
Click on the 'Add New Service' button to start the Service creation.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/create-service.png"
alt="Create a new service"
caption="Add a new Service from the Storage Services page" /%}
{% /stepVisualInfo %}
{% /step %}
{% step srNumber=3 %}
{% stepDescription title="3. Select the Service Type" %}
Select GCS as the service type and click Next.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/GCS/select-service.png"
alt="Select Service"
caption="Select your service from the list" /%}
{% /stepVisualInfo %}
{% /step %}
{% step srNumber=4 %}
{% stepDescription title="4. Name and Describe your Service" %}
Provide a name and description for your service.
#### Service Name
OpenMetadata uniquely identifies services by their Service Name. Provide
a name that distinguishes your deployment from other services, including
the other Storage services that you might be ingesting metadata
from.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/GCS/add-new-service.png"
alt="Add New Service"
caption="Provide a Name and description for your Service" /%}
{% /stepVisualInfo %}
{% /step %}
{% step srNumber=5 %}
{% stepDescription title="5. Configure the Service Connection" %}
In this step, we will configure the connection settings required for
this connector. Please follow the instructions below to ensure that
you've configured the connector to read from your GCS service as
desired.
{% /stepDescription %}
{% stepVisualInfo %}
{% image
src="/images/v1.3/connectors/GCS/service-connection.png"
alt="Configure service connection"
caption="Configure the service connection by filling the form" /%}
{% /stepVisualInfo %}
{% /step %}
{% extraContent parentTagName="stepsContainer" %}
#### Connection Details
{% /extraContent %}
{% partial file="/v1.3/connectors/test-connection.md" /%}
{% partial file="/v1.3/connectors/storage/configure-ingestion.md" /%}
{% partial file="/v1.3/connectors/ingestion-schedule-and-deploy.md" /%}
{% /stepsContainer %}
{% partial file="/v1.3/connectors/troubleshooting.md" /%}

View File

@ -0,0 +1,189 @@
---
title: Run the GCS Connector Externally
slug: /connectors/storage/gcs/yaml
---
{% connectorDetailsHeader
name="GCS"
stage="PROD"
platform="Collate"
availableFeatures=["Metadata"]
unavailableFeatures=[]
/ %}
This page contains the setup guide and reference information for the GCS connector.
Configure and schedule GCS metadata workflows from the CLI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
{% partial file="/v1.3/connectors/external-ingestion-deployment.md" /%}
## Requirements
{%inlineCallout icon="description" bold="OpenMetadata 1.0 or later" href="/deployment"%}
To deploy OpenMetadata, check the Deployment guides.
{%/inlineCallout%}
We need the following permissions in GCP:
### GCS Permissions
For all the buckets that we want to ingest, we need to provide the following:
- `storage.buckets.get`
- `storage.buckets.list`
- `storage.objects.get`
- `storage.objects.list`
### OpenMetadata Manifest
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level
metadata from buckets, but in order to understand their internal structure we need users to provide an `openmetadata.json`
file at the bucket root.
You can learn more about this [here](/connectors/storage). Keep reading for an example on the shape of the manifest file.
{% partial file="/v1.3/connectors/storage/manifest.md" /%}
## Metadata Ingestion
All connectors are defined as JSON Schemas.
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/storage/GCSConnection.json)
you can find the structure to create a connection to Athena.
In order to create and run a Metadata Ingestion workflow, we will follow
the steps to create a YAML configuration able to connect to the source,
process the Entities if needed, and reach the OpenMetadata server.
The workflow is modeled around the following
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json)
### 1. Define the YAML Config
This is a sample config for Athena:
{% codePreview %}
{% codeInfoContainer %}
#### Source Configuration - Service Connection
**gcpConfig:**
**1.** Passing the raw credential values provided by GCP. This requires us to provide the following information, all provided by GCP:
- **type**: Credentials Type is the type of the account, for a service account the value of this field is `service_account`. To fetch this key, look for the value associated with the `type` key in the service account key file.
- **projectId**: A project ID is a unique string used to differentiate your project from all others in Google Cloud. To fetch this key, look for the value associated with the `project_id` key in the service account key file. You can also pass multiple project id to ingest metadata from different GCP projects into one service.
- **privateKeyId**: This is a unique identifier for the private key associated with the service account. To fetch this key, look for the value associated with the `private_key_id` key in the service account file.
- **privateKey**: This is the private key associated with the service account that is used to authenticate and authorize access to GCP. To fetch this key, look for the value associated with the `private_key` key in the service account file.
- **clientEmail**: This is the email address associated with the service account. To fetch this key, look for the value associated with the `client_email` key in the service account key file.
- **clientId**: This is a unique identifier for the service account. To fetch this key, look for the value associated with the `client_id` key in the service account key file.
- **authUri**: This is the URI for the authorization server. To fetch this key, look for the value associated with the `auth_uri` key in the service account key file. The default value to Auth URI is https://accounts.google.com/o/oauth2/auth.
- **tokenUri**: The Google Cloud Token URI is a specific endpoint used to obtain an OAuth 2.0 access token from the Google Cloud IAM service. This token allows you to authenticate and access various Google Cloud resources and APIs that require authorization. To fetch this key, look for the value associated with the `token_uri` key in the service account credentials file. Default Value to Token URI is https://oauth2.googleapis.com/token.
- **authProviderX509CertUrl**: This is the URL of the certificate that verifies the authenticity of the authorization server. To fetch this key, look for the value associated with the `auth_provider_x509_cert_url` key in the service account key file. The Default value for Auth Provider X509Cert URL is https://www.googleapis.com/oauth2/v1/certs
- **clientX509CertUrl**: This is the URL of the certificate that verifies the authenticity of the service account. To fetch this key, look for the value associated with the `client_x509_cert_url` key in the service account key file.
**2.** Passing a local file path that contains the credentials:
- **gcpCredentialsPath**
- If you prefer to pass the credentials file, you can do so as follows:
```yaml
source:
type: gcs
serviceName: local_gcs
serviceConnection:
config:
type: GCS
credentials:
gcpConfig: <path to file>
```
- If you want to use [ADC authentication](https://cloud.google.com/docs/authentication#adc) for GCP you can just leave
the GCP credentials empty. This is why they are not marked as required.
```yaml
...
source:
type: gcs
serviceName: local_gcs
serviceConnection:
config:
type: GCS
credentials:
gcpConfig: {}
...
```
{% /codeInfo %}
{% partial file="/v1.3/connectors/yaml/database/source-config-def.md" /%}
{% partial file="/v1.3/connectors/yaml/ingestion-sink-def.md" /%}
{% partial file="/v1.3/connectors/yaml/workflow-config-def.md" /%}
#### Advanced Configuration
{% codeInfo srNumber=2 %}
**Connection Options (Optional)**: Enter the details for any additional connection options that can be sent to Athena during the connection. These details must be added as Key-Value pairs.
{% /codeInfo %}
{% codeInfo srNumber=3 %}
**Connection Arguments (Optional)**: Enter the details for any additional connection arguments such as security or protocol configs that can be sent to Athena during the connection. These details must be added as Key-Value pairs.
{% /codeInfo %}
{% /codeInfoContainer %}
{% codeBlock fileName="filename.yaml" %}
```yaml
source:
type: gcs
serviceName: "<service name>"
serviceConnection:
config:
type: GCS
```
```yaml {% srNumber=1 %}
credentials:
gcpConfig:
type: My Type
projectId: project ID # ["project-id-1", "project-id-2"]
privateKeyId: us-east-2
privateKey: |
-----BEGIN PRIVATE KEY-----
Super secret key
-----END PRIVATE KEY-----
clientEmail: client@mail.com
clientId: 1234
# authUri: https://accounts.google.com/o/oauth2/auth (default)
# tokenUri: https://oauth2.googleapis.com/token (default)
# authProviderX509CertUrl: https://www.googleapis.com/oauth2/v1/certs (default)
clientX509CertUrl: https://cert.url
# taxonomyLocation: us
# taxonomyProjectID: ["project-id-1", "project-id-2"]
# usageLocation: us
```
```yaml {% srNumber=2 %}
# connectionOptions:
# key: value
```
```yaml {% srNumber=3 %}
# connectionArguments:
# key: value
```
{% partial file="/v1.3/connectors/yaml/database/source-config.md" /%}
{% partial file="/v1.3/connectors/yaml/ingestion-sink.md" /%}
{% partial file="/v1.3/connectors/yaml/workflow-config.md" /%}
{% /codeBlock %}
{% /codePreview %}

View File

@ -10,6 +10,8 @@ This is the supported list of connectors for Storage Services:
{% connectorsListContainer %}
{% connectorInfoCard name="S3" stage="PROD" href="/connectors/storage/s3" platform="OpenMetadata" / %}
{% connectorInfoCard name="Azure" stage="PROD" href="/connectors/storage/azure" platform="Collate" / %}
{% connectorInfoCard name="GCS" stage="PROD" href="/connectors/storage/gcs" platform="Collate" / %}
{% /connectorsListContainer %}

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 322 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 153 KiB

View File

@ -0,0 +1,70 @@
# ADLS
In this section, we provide guides and references to use the ADLS connector.
By default, the ADLS connector will ingest only top-level containers (Buckets). If you want to extract any information from within and their data models, you can follow the [docs](https://docs.open-metadata.org/connectors/storage).
## Requirements
We need the following permissions in AWS:
### ADLS Permissions
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following permissions on the Storage Account:
- Storage Blob Data Contributor
- Storage Queue Data Contributor
You can find further information on the Kafka connector in the [docs](https://docs.open-metadata.org/connectors/storage/adls).
## Connection Details
$$section
### Client ID $(id="clientId")
This is a unique identifier for the service account. To fetch this key, look for the value associated with the `client_id` key in the service account key file.
$$
$$section
### Client Secret $(id="clientSecret")
To get the client secret, follow these steps:
1. Log into [Microsoft Azure](https://ms.portal.azure.com/#allservices).
2. Search for `App registrations` and select the `App registrations link`.
3. Select the `Azure AD` app you're using for this connection.
4. Under `Manage`, select `Certificates & secrets`.
5. Under `Client secrets`, select `New client secret`.
6. In the `Add a client secret` pop-up window, provide a description for your application secret. Choose when the application should expire, and select `Add`.
7. From the `Client secrets` section, copy the string in the `Value` column of the newly created application secret.
$$
$$section
### Tenant ID $(id="tenantId")
To get the tenant ID, follow these steps:
1. Log into [Microsoft Azure](https://ms.portal.azure.com/#allservices).
2. Search for `App registrations` and select the `App registrations link`.
3. Select the `Azure AD` app you're using for Power BI.
4. From the `Overview` section, copy the `Directory (tenant) ID`.
$$
$$section
### Account Name $(id="accountName")
Here are the step-by-step instructions for finding the account name for an Azure Data Lake Storage account:
1. Sign in to the Azure portal and navigate to the `Storage accounts` page.
2. Find the Data Lake Storage account you want to access and click on its name.
3. In the account overview page, locate the `Account name` field. This is the unique identifier for the Data Lake Storage account.
4. You can use this account name to access and manage the resources associated with the account, such as creating and managing containers and directories.
$$
$$section
### Key Vault Name $(id="vaultName")
Azure Key Vault serves as a centralized secrets manager, securely storing and managing sensitive information, such as connection strings and cryptographic keys.
$$

View File

@ -0,0 +1,111 @@
# GCS
In this section, we provide guides and references to use the GCS connector.
By default, the GCS connector will ingest only top-level containers (Buckets). If you want to extract any information from within and their data models, you can follow the [docs](https://docs.open-metadata.org/connectors/storage).
## Requirements
We need the following permissions in GCS:
### GCS Permissions
For all the buckets that we want to ingest, we need to provide the following:
- `storage.buckets.get`
- `storage.buckets.list`
- `storage.objects.get`
- `storage.objects.list`
## Connection Details
$$section
### GCP Credentials Configuration $(id="gcpConfig")
You can authenticate with your BigQuery instance using either `GCP Credentials Path` where you can specify the file path of the service account key, or you can pass the values directly by choosing the `GCP Credentials Values` from the service account key file.
You can check [this](https://cloud.google.com/iam/docs/keys-create-delete#iam-service-account-keys-create-console) documentation on how to create the service account keys and download it.
If you want to use [ADC authentication](https://cloud.google.com/docs/authentication#adc) for BigQuery you can just leave the GCP credentials empty.
$$
$$section
### Credentials Type $(id="type")
Credentials Type is the type of the account, for a service account the value of this field is `service_account`. To fetch this key, look for the value associated with the `type` key in the service account key file.
$$
$$section
### Project ID $(id="projectId")
A project ID is a unique string used to differentiate your project from all others in Google Cloud. To fetch this key, look for the value associated with the `project_id` key in the service account key file.
$$
$$section
### Private Key ID $(id="privateKeyId")
This is a unique identifier for the private key associated with the service account. To fetch this key, look for the value associated with the `private_key_id` key in the service account file.
$$
$$section
### Private Key $(id="privateKey")
This is the private key associated with the service account that is used to authenticate and authorize access to GCP. To fetch this key, look for the value associated with the `private_key` key in the service account file.
Make sure you are passing the key in a correct format. If your private key looks like this:
```
-----BEGIN ENCRYPTED PRIVATE KEY-----
MII..
MBQ...
CgU..
8Lt..
...
h+4=
-----END ENCRYPTED PRIVATE KEY-----
```
You will have to replace new lines with `\n` and the final private key that you need to pass should look like this:
```
-----BEGIN ENCRYPTED PRIVATE KEY-----\nMII..\nMBQ...\nCgU..\n8Lt..\n...\nh+4=\n-----END ENCRYPTED PRIVATE KEY-----\n
```
$$
$$section
### Client Email $(id="clientEmail")
This is the email address associated with the service account. To fetch this key, look for the value associated with the `client_email` key in the service account key file.
$$
$$section
### Client ID $(id="clientId")
This is a unique identifier for the service account. To fetch this key, look for the value associated with the `client_id` key in the service account key file.
$$
$$section
### Auth URI $(id="authUri")
This is the URI for the authorization server. To fetch this key, look for the value associated with the `auth_uri` key in the service account key file.
$$
$$section
### Token URI $(id="tokenUri")
The Google Cloud Token URI is a specific endpoint used to obtain an OAuth 2.0 access token from the Google Cloud IAM service. This token allows you to authenticate and access various Google Cloud resources and APIs that require authorization.
To fetch this key, look for the value associated with the `token_uri` key in the service account credentials file.
$$
$$section
### Auth Provider X509Cert URL $(id="authProviderX509CertUrl")
This is the URL of the certificate that verifies the authenticity of the authorization server. To fetch this key, look for the value associated with the `auth_provider_x509_cert_url` key in the service account key file.
$$
$$section
### Client X509Cert URL $(id="clientX509CertUrl")
This is the URL of the certificate that verifies the authenticity of the service account. To fetch this key, look for the value associated with the `client_x509_cert_url` key in the service account key file.
$$

View File

@ -108,6 +108,7 @@ class ServiceUtilClassBase {
StorageServiceType.Adls,
DatabaseServiceType.QueryLog,
DatabaseServiceType.Dbt,
StorageServiceType.Gcs,
];
protected updateUnsupportedServices(types: string[]) {