diff --git a/docs-website/sidebars.js b/docs-website/sidebars.js index f55c545bd7..f77e6924ff 100644 --- a/docs-website/sidebars.js +++ b/docs-website/sidebars.js @@ -118,6 +118,7 @@ module.exports = { "datahub-kubernetes/README", "docker/datahub-upgrade/README", "docs/deploy/aws", + "docs/deploy/gcp", // Purposely not including the following: // - "docker/datahub-frontend/README", // - "docker/datahub-gms-graphql-service/README", diff --git a/docs/deploy/aws.md b/docs/deploy/aws.md index 2c34699623..26f8849b2a 100644 --- a/docs/deploy/aws.md +++ b/docs/deploy/aws.md @@ -193,7 +193,7 @@ Provision a MySQL database in AWS RDS that shares the VPC with the kubernetes cl the VPC of the kubernetes cluster. Once the database is provisioned, you should be able to see the following page. Take a note of the endpoint marked by the red box. -![AWS RDS](../imgs/aws-rds.png) +![AWS RDS](../imgs/aws/aws-rds.png) First, add the DB password to kubernetes by running the following. @@ -226,7 +226,7 @@ Provision an elasticsearch domain running elasticsearch version 7.9 or above tha cluster or has VPC peering set up between the VPC of the kubernetes cluster. Once the domain is provisioned, you should be able to see the following page. Take a note of the endpoint marked by the red box. -![AWS Elasticsearch Service](../imgs/aws-elasticsearch.png) +![AWS Elasticsearch Service](../imgs/aws/aws-elasticsearch.png) Update the elasticsearch settings under global in the quickstart-values.yaml as follows. @@ -255,7 +255,7 @@ Provision an MSK cluster that shares the VPC with the kubernetes cluster or has the kubernetes cluster. Once the domain is provisioned, click on the “View client information” button in the ‘Cluster Summary” section. You should see a page like below. Take a note of the endpoints marked by the red boxes. -![AWS MSK](../imgs/aws-msk.png) +![AWS MSK](../imgs/aws/aws-msk.png) Update the kafka settings under global in the quickstart-values.yaml as follows. diff --git a/docs/deploy/gcp.md b/docs/deploy/gcp.md new file mode 100644 index 0000000000..f1d8cd862f --- /dev/null +++ b/docs/deploy/gcp.md @@ -0,0 +1,102 @@ +--- +title: "Deploying to GCP" +--- + +# GCP setup guide + +The following is a set of instructions to quickstart DataHub on GCP Google Kubernetes Engine (GKE). Note, the guide +assumes that you do not have a kubernetes cluster set up. If you are deploying DataHub to an existing cluster, please +skip the corresponding sections. + +## Prerequisites + +This guide requires the following tools: + +- [kubectl](https://kubernetes.io/docs/tasks/tools/) to manage kubernetes resources +- [helm](https://helm.sh/docs/intro/install/) to deploy the resources based on helm charts. Note, we only support Helm + 3. +- [gcloud](https://cloud.google.com/sdk/docs/install) to manage GCP resources + +Follow the +following [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster#before_you_begin) to +correctly set up Google Cloud SDK. + +After setting up, run `gcloud services enable container.googleapis.com` to make sure GKE service is enabled. + +## Start up a kubernetes cluster on GKE + +Let’s follow this [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster) to create a +new cluster using gcloud. Run the following command with cluster-name set to the cluster name of choice, and zone set to +the GCP zone you are operating on. + +``` +gcloud container clusters create <> \ + --zone <> \ + -m e2-standard-2 +``` + +The command will provision a GKE cluster powered by 3 e2-standard-2 (2 CPU, 8GB RAM) nodes. + +If you are planning to run the storage layer (MySQL, Elasticsearch, Kafka) as pods in the cluster, you need at least 3 +nodes with the above specs. If you decide to use managed storage services, you can reduce the number of nodes or use +m3.medium nodes to save cost. Refer to +this [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-regional-cluster) for creating a regional +cluster for better robustness. + +Run `kubectl get nodes` to confirm that the cluster has been setup correctly. You should get results like below + +``` +NAME STATUS ROLES AGE VERSION +gke-datahub-default-pool-e5be7c4f-8s97 Ready 34h v1.19.10-gke.1600 +gke-datahub-default-pool-e5be7c4f-d68l Ready 34h v1.19.10-gke.1600 +gke-datahub-default-pool-e5be7c4f-rksj Ready 34h v1.19.10-gke.1600 +``` + +## Setup DataHub using Helm + +Once the kubernetes cluster has been set up, you can deploy DataHub and it’s prerequisites using helm. Please follow the +steps in this [guide](../../datahub-kubernetes/README.md) + +## Expose endpoints using GKE ingress controller + +Now that all the pods are up and running, you need to expose the datahub-frontend end point by setting +up [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/). Easiest way to set up ingress is to use +the GKE page on [GCP website](https://console.cloud.google.com/kubernetes/discovery). + +Once all deploy is successful, you should see a page like below in the "Services & Ingress" tab on the left. + +![Services and Ingress](../imgs/gcp/services_ingress.png) + +Tick the checkbox for datahub-datahub-frontend and click "CREATE INGRESS" button. You should land on the following page. + +![Ingress1](../imgs/gcp/ingress1.png) + +Type in an arbitrary name for the ingress and click on the second step "Host and path rules". You should land on the +following page. + +![Ingress2](../imgs/gcp/ingress2.png) + +Select "datahub-datahub-frontend" in the dropdown menu for backends, and then click on "ADD HOST AND PATH RULE" button. +In the second row that got created, add in the host name of choice (here gcp.datahubproject.io) and select +"datahub-datahub-frontend" in the backends dropdown. + +This step adds the rule allowing requests from the host name of choice to get routed to datahub-frontend service. Click +on step 3 "Frontend configuration". You should land on the following page. + +![Ingress3](../imgs/gcp/ingress3.png) + +Choose HTTPS in the dropdown menu for protocol. To enable SSL, you need to add a certificate. If you do not have one, +you can click "CREATE A NEW CERTIFICATE" and input the host name of choice. GCP will create a certificate for you. + +Now press "CREATE" button on the left to create ingress! After around 5 minutes, you should see the following. + +![Ingress Ready](../imgs/gcp/ingress_ready.png) + +In your domain provider, add an A record for the host name set above using the IP address on the ingress page (noted +with the red box). Once DNS updates, you should be able to access DataHub through the host name!! + +Note, ignore the warning icon next to ingress. It takes about ten minutes for ingress to check that the backend service +is ready and show a check mark as follows. However, ingress is fully functional once you see the above page. + +![Ingress Final](../imgs/gcp/ingress_final.png) + diff --git a/docs/how/configure-oidc-react.md b/docs/how/configure-oidc-react.md index 7398df72e6..0dafca9623 100644 --- a/docs/how/configure-oidc-react.md +++ b/docs/how/configure-oidc-react.md @@ -71,6 +71,38 @@ the authenticated profile as the DataHub CorpUser identity. > By default, the login callback endpoint exposed by DataHub will be located at `${AUTH_OIDC_BASE_URL}/callback/oidc`. This must **exactly** match the login redirect URL you've registered with your identity provider in step 1. +In kubernetes, you can add the above env variables in the values.yaml as follows. + +``` +datahub-frontend: + ... + extraEnvs: + - name: AUTH_OIDC_ENABLED + value: true + - name: AUTH_OIDC_CLIENT_ID + value: your-client-id + - name: AUTH_OIDC_CLIENT_SECRET + value: your-client-secret + - name: AUTH_OIDC_DISCOVERY_URI + value: your-provider-discovery-url + - name: AUTH_OIDC_BASE_URL + value: your-datahub-url +``` + +You can also package OIDC client secrets into a k8s secret by running + +```kubectl create secret generic datahub-oidc-secret --from-literal=secret=<>``` + +Then set the secret env as follows. + +``` + - name: AUTH_OIDC_CLIENT_SECRET + valueFrom: + secretKeyRef: + name: datahub-oidc-secret + key: secret +``` + #### Advanced You can optionally customize the flow further using advanced configurations. These allow diff --git a/docs/imgs/aws-elasticsearch.png b/docs/imgs/aws/aws-elasticsearch.png similarity index 100% rename from docs/imgs/aws-elasticsearch.png rename to docs/imgs/aws/aws-elasticsearch.png diff --git a/docs/imgs/aws-msk.png b/docs/imgs/aws/aws-msk.png similarity index 100% rename from docs/imgs/aws-msk.png rename to docs/imgs/aws/aws-msk.png diff --git a/docs/imgs/aws-rds.png b/docs/imgs/aws/aws-rds.png similarity index 100% rename from docs/imgs/aws-rds.png rename to docs/imgs/aws/aws-rds.png diff --git a/docs/imgs/gcp/ingress1.png b/docs/imgs/gcp/ingress1.png new file mode 100644 index 0000000000..4cb49834af Binary files /dev/null and b/docs/imgs/gcp/ingress1.png differ diff --git a/docs/imgs/gcp/ingress2.png b/docs/imgs/gcp/ingress2.png new file mode 100644 index 0000000000..cdf2446b0e Binary files /dev/null and b/docs/imgs/gcp/ingress2.png differ diff --git a/docs/imgs/gcp/ingress3.png b/docs/imgs/gcp/ingress3.png new file mode 100644 index 0000000000..cc3745ad97 Binary files /dev/null and b/docs/imgs/gcp/ingress3.png differ diff --git a/docs/imgs/gcp/ingress_final.png b/docs/imgs/gcp/ingress_final.png new file mode 100644 index 0000000000..a30ca744c4 Binary files /dev/null and b/docs/imgs/gcp/ingress_final.png differ diff --git a/docs/imgs/gcp/ingress_ready.png b/docs/imgs/gcp/ingress_ready.png new file mode 100644 index 0000000000..d14016e420 Binary files /dev/null and b/docs/imgs/gcp/ingress_ready.png differ diff --git a/docs/imgs/gcp/services_ingress.png b/docs/imgs/gcp/services_ingress.png new file mode 100644 index 0000000000..1d9ff2b313 Binary files /dev/null and b/docs/imgs/gcp/services_ingress.png differ