3.1 KiB
title | slug |
---|---|
GCS Datalake | /connectors/database/gcs-datalake |
{% connectorDetailsHeader name="GCS Datalake" stage="PROD" platform="OpenMetadata" availableFeatures=["Metadata", "Data Profiler", "Data Quality"] unavailableFeatures=["Query Usage", "Lineage", "Column-level Lineage", "Owners", "dbt", "Tags", "Stored Procedures"] / %}
In this section, we provide guides and references to use the GCS Datalake connector.
Configure and schedule GCS Datalake metadata and profiler workflows from the OpenMetadata UI:
{% partial file="/v1.6/connectors/ingestion-modes-tiles.md" variables={yamlPath: "/connectors/database/gcs-datalake/yaml"} /%}
Requirements
{% note %}
The GCS Datalake connector supports extracting metadata from file types JSON
, CSV
, TSV
& Parquet
.
{% /note %}
Metadata Ingestion
{% partial file="/v1.6/connectors/metadata-ingestion-ui.md" variables={ connector: "Datalake", selectServicePath: "/images/v1.6/connectors/datalake/select-service.png", addNewServicePath: "/images/v1.6/connectors/datalake/add-new-service.png", serviceConnectionPath: "/images/v1.6/connectors/datalake/service-connection.png", } /%}
{% stepsContainer %} {% extraContent parentTagName="stepsContainer" %}
Connection Details for GCS
-
Bucket Name: A bucket name in DataLake is a unique identifier used to organize and store data objects. It's similar to a folder name, but it's used for object storage rather than file storage.
-
Prefix: The prefix of a data source in datalake refers to the first part of the data path that identifies the source or origin of the data. It's used to organize and categorize data within the datalake, and can help users easily locate and access the data they need.
GCS Credentials
We support two ways of authenticating to GCS:
- Passing the raw credential values provided by BigQuery. This requires us to provide the following information, all provided by BigQuery:
- Credentials type, e.g.
service_account
. - Project ID
- Private Key ID
- Private Key
- Client Email
- Client ID
- Auth URI, https://accounts.google.com/o/oauth2/auth by default
- Token URI, https://oauth2.googleapis.com/token by default
- Authentication Provider X509 Certificate URL, https://www.googleapis.com/oauth2/v1/certs by default
- Client X509 Certificate URL
- Credentials type, e.g.
{% partial file="/v1.6/connectors/database/advanced-configuration.md" /%}
{% /extraContent %}
{% partial file="/v1.6/connectors/test-connection.md" /%}
{% partial file="/v1.6/connectors/database/configure-ingestion.md" /%}
{% partial file="/v1.6/connectors/ingestion-schedule-and-deploy.md" /%}
{% /stepsContainer %}
{% partial file="/v1.6/connectors/troubleshooting.md" /%}
{% partial file="/v1.6/connectors/database/related.md" /%}