Prajwal214 30a091b466
Docs: Updating datalake & dbt Cloud docs (#17983)
Co-authored-by: Prajwal Pandit <prajwalpandit@Prajwals-MacBook-Air.local>
2024-09-25 10:49:44 +05:30

3.1 KiB

title slug
GCS Datalake /connectors/database/gcs-datalake

{% connectorDetailsHeader name="GCS Datalake" stage="PROD" platform="OpenMetadata" availableFeatures=["Metadata", "Data Profiler", "Data Quality"] unavailableFeatures=["Query Usage", "Lineage", "Column-level Lineage", "Owners", "dbt", "Tags", "Stored Procedures"] / %}

In this section, we provide guides and references to use the GCS Datalake connector.

Configure and schedule GCS Datalake metadata and profiler workflows from the OpenMetadata UI:

{% partial file="/v1.6/connectors/ingestion-modes-tiles.md" variables={yamlPath: "/connectors/database/gcs-datalake/yaml"} /%}

Requirements

{% note %} The GCS Datalake connector supports extracting metadata from file types JSON, CSV, TSV & Parquet. {% /note %}

Metadata Ingestion

{% partial file="/v1.6/connectors/metadata-ingestion-ui.md" variables={ connector: "Datalake", selectServicePath: "/images/v1.6/connectors/datalake/select-service.png", addNewServicePath: "/images/v1.6/connectors/datalake/add-new-service.png", serviceConnectionPath: "/images/v1.6/connectors/datalake/service-connection.png", } /%}

{% stepsContainer %} {% extraContent parentTagName="stepsContainer" %}

Connection Details for GCS

  • Bucket Name: A bucket name in DataLake is a unique identifier used to organize and store data objects. It's similar to a folder name, but it's used for object storage rather than file storage.

  • Prefix: The prefix of a data source in datalake refers to the first part of the data path that identifies the source or origin of the data. It's used to organize and categorize data within the datalake, and can help users easily locate and access the data they need.

GCS Credentials

We support two ways of authenticating to GCS:

  1. Passing the raw credential values provided by BigQuery. This requires us to provide the following information, all provided by BigQuery:
    1. Credentials type, e.g. service_account.
    2. Project ID
    3. Private Key ID
    4. Private Key
    5. Client Email
    6. Client ID
    7. Auth URI, https://accounts.google.com/o/oauth2/auth by default
    8. Token URI, https://oauth2.googleapis.com/token by default
    9. Authentication Provider X509 Certificate URL, https://www.googleapis.com/oauth2/v1/certs by default
    10. Client X509 Certificate URL

{% partial file="/v1.6/connectors/database/advanced-configuration.md" /%}

{% /extraContent %}

{% partial file="/v1.6/connectors/test-connection.md" /%}

{% partial file="/v1.6/connectors/database/configure-ingestion.md" /%}

{% partial file="/v1.6/connectors/ingestion-schedule-and-deploy.md" /%}

{% /stepsContainer %}

{% partial file="/v1.6/connectors/troubleshooting.md" /%}

{% partial file="/v1.6/connectors/database/related.md" /%}