feat(k8s): add GCP deploy recipe (#2768)
@ -118,6 +118,7 @@ module.exports = {
|
||||
"datahub-kubernetes/README",
|
||||
"docker/datahub-upgrade/README",
|
||||
"docs/deploy/aws",
|
||||
"docs/deploy/gcp",
|
||||
// Purposely not including the following:
|
||||
// - "docker/datahub-frontend/README",
|
||||
// - "docker/datahub-gms-graphql-service/README",
|
||||
|
||||
@ -193,7 +193,7 @@ Provision a MySQL database in AWS RDS that shares the VPC with the kubernetes cl
|
||||
the VPC of the kubernetes cluster. Once the database is provisioned, you should be able to see the following page. Take
|
||||
a note of the endpoint marked by the red box.
|
||||
|
||||

|
||||

|
||||
|
||||
First, add the DB password to kubernetes by running the following.
|
||||
|
||||
@ -226,7 +226,7 @@ Provision an elasticsearch domain running elasticsearch version 7.9 or above tha
|
||||
cluster or has VPC peering set up between the VPC of the kubernetes cluster. Once the domain is provisioned, you should
|
||||
be able to see the following page. Take a note of the endpoint marked by the red box.
|
||||
|
||||

|
||||

|
||||
|
||||
Update the elasticsearch settings under global in the quickstart-values.yaml as follows.
|
||||
|
||||
@ -255,7 +255,7 @@ Provision an MSK cluster that shares the VPC with the kubernetes cluster or has
|
||||
the kubernetes cluster. Once the domain is provisioned, click on the “View client information” button in the ‘Cluster
|
||||
Summary” section. You should see a page like below. Take a note of the endpoints marked by the red boxes.
|
||||
|
||||

|
||||

|
||||
|
||||
Update the kafka settings under global in the quickstart-values.yaml as follows.
|
||||
|
||||
|
||||
102
docs/deploy/gcp.md
Normal file
@ -0,0 +1,102 @@
|
||||
---
|
||||
title: "Deploying to GCP"
|
||||
---
|
||||
|
||||
# GCP setup guide
|
||||
|
||||
The following is a set of instructions to quickstart DataHub on GCP Google Kubernetes Engine (GKE). Note, the guide
|
||||
assumes that you do not have a kubernetes cluster set up. If you are deploying DataHub to an existing cluster, please
|
||||
skip the corresponding sections.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
This guide requires the following tools:
|
||||
|
||||
- [kubectl](https://kubernetes.io/docs/tasks/tools/) to manage kubernetes resources
|
||||
- [helm](https://helm.sh/docs/intro/install/) to deploy the resources based on helm charts. Note, we only support Helm
|
||||
3.
|
||||
- [gcloud](https://cloud.google.com/sdk/docs/install) to manage GCP resources
|
||||
|
||||
Follow the
|
||||
following [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster#before_you_begin) to
|
||||
correctly set up Google Cloud SDK.
|
||||
|
||||
After setting up, run `gcloud services enable container.googleapis.com` to make sure GKE service is enabled.
|
||||
|
||||
## Start up a kubernetes cluster on GKE
|
||||
|
||||
Let’s follow this [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster) to create a
|
||||
new cluster using gcloud. Run the following command with cluster-name set to the cluster name of choice, and zone set to
|
||||
the GCP zone you are operating on.
|
||||
|
||||
```
|
||||
gcloud container clusters create <<cluster-name>> \
|
||||
--zone <<zone>> \
|
||||
-m e2-standard-2
|
||||
```
|
||||
|
||||
The command will provision a GKE cluster powered by 3 e2-standard-2 (2 CPU, 8GB RAM) nodes.
|
||||
|
||||
If you are planning to run the storage layer (MySQL, Elasticsearch, Kafka) as pods in the cluster, you need at least 3
|
||||
nodes with the above specs. If you decide to use managed storage services, you can reduce the number of nodes or use
|
||||
m3.medium nodes to save cost. Refer to
|
||||
this [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-regional-cluster) for creating a regional
|
||||
cluster for better robustness.
|
||||
|
||||
Run `kubectl get nodes` to confirm that the cluster has been setup correctly. You should get results like below
|
||||
|
||||
```
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
gke-datahub-default-pool-e5be7c4f-8s97 Ready <none> 34h v1.19.10-gke.1600
|
||||
gke-datahub-default-pool-e5be7c4f-d68l Ready <none> 34h v1.19.10-gke.1600
|
||||
gke-datahub-default-pool-e5be7c4f-rksj Ready <none> 34h v1.19.10-gke.1600
|
||||
```
|
||||
|
||||
## Setup DataHub using Helm
|
||||
|
||||
Once the kubernetes cluster has been set up, you can deploy DataHub and it’s prerequisites using helm. Please follow the
|
||||
steps in this [guide](../../datahub-kubernetes/README.md)
|
||||
|
||||
## Expose endpoints using GKE ingress controller
|
||||
|
||||
Now that all the pods are up and running, you need to expose the datahub-frontend end point by setting
|
||||
up [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/). Easiest way to set up ingress is to use
|
||||
the GKE page on [GCP website](https://console.cloud.google.com/kubernetes/discovery).
|
||||
|
||||
Once all deploy is successful, you should see a page like below in the "Services & Ingress" tab on the left.
|
||||
|
||||

|
||||
|
||||
Tick the checkbox for datahub-datahub-frontend and click "CREATE INGRESS" button. You should land on the following page.
|
||||
|
||||

|
||||
|
||||
Type in an arbitrary name for the ingress and click on the second step "Host and path rules". You should land on the
|
||||
following page.
|
||||
|
||||

|
||||
|
||||
Select "datahub-datahub-frontend" in the dropdown menu for backends, and then click on "ADD HOST AND PATH RULE" button.
|
||||
In the second row that got created, add in the host name of choice (here gcp.datahubproject.io) and select
|
||||
"datahub-datahub-frontend" in the backends dropdown.
|
||||
|
||||
This step adds the rule allowing requests from the host name of choice to get routed to datahub-frontend service. Click
|
||||
on step 3 "Frontend configuration". You should land on the following page.
|
||||
|
||||

|
||||
|
||||
Choose HTTPS in the dropdown menu for protocol. To enable SSL, you need to add a certificate. If you do not have one,
|
||||
you can click "CREATE A NEW CERTIFICATE" and input the host name of choice. GCP will create a certificate for you.
|
||||
|
||||
Now press "CREATE" button on the left to create ingress! After around 5 minutes, you should see the following.
|
||||
|
||||

|
||||
|
||||
In your domain provider, add an A record for the host name set above using the IP address on the ingress page (noted
|
||||
with the red box). Once DNS updates, you should be able to access DataHub through the host name!!
|
||||
|
||||
Note, ignore the warning icon next to ingress. It takes about ten minutes for ingress to check that the backend service
|
||||
is ready and show a check mark as follows. However, ingress is fully functional once you see the above page.
|
||||
|
||||

|
||||
|
||||
@ -71,6 +71,38 @@ the authenticated profile as the DataHub CorpUser identity.
|
||||
|
||||
> By default, the login callback endpoint exposed by DataHub will be located at `${AUTH_OIDC_BASE_URL}/callback/oidc`. This must **exactly** match the login redirect URL you've registered with your identity provider in step 1.
|
||||
|
||||
In kubernetes, you can add the above env variables in the values.yaml as follows.
|
||||
|
||||
```
|
||||
datahub-frontend:
|
||||
...
|
||||
extraEnvs:
|
||||
- name: AUTH_OIDC_ENABLED
|
||||
value: true
|
||||
- name: AUTH_OIDC_CLIENT_ID
|
||||
value: your-client-id
|
||||
- name: AUTH_OIDC_CLIENT_SECRET
|
||||
value: your-client-secret
|
||||
- name: AUTH_OIDC_DISCOVERY_URI
|
||||
value: your-provider-discovery-url
|
||||
- name: AUTH_OIDC_BASE_URL
|
||||
value: your-datahub-url
|
||||
```
|
||||
|
||||
You can also package OIDC client secrets into a k8s secret by running
|
||||
|
||||
```kubectl create secret generic datahub-oidc-secret --from-literal=secret=<<OIDC SECRET>>```
|
||||
|
||||
Then set the secret env as follows.
|
||||
|
||||
```
|
||||
- name: AUTH_OIDC_CLIENT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: datahub-oidc-secret
|
||||
key: secret
|
||||
```
|
||||
|
||||
#### Advanced
|
||||
|
||||
You can optionally customize the flow further using advanced configurations. These allow
|
||||
|
||||
|
Before Width: | Height: | Size: 149 KiB After Width: | Height: | Size: 149 KiB |
|
Before Width: | Height: | Size: 172 KiB After Width: | Height: | Size: 172 KiB |
|
Before Width: | Height: | Size: 341 KiB After Width: | Height: | Size: 341 KiB |
BIN
docs/imgs/gcp/ingress1.png
Normal file
|
After Width: | Height: | Size: 196 KiB |
BIN
docs/imgs/gcp/ingress2.png
Normal file
|
After Width: | Height: | Size: 259 KiB |
BIN
docs/imgs/gcp/ingress3.png
Normal file
|
After Width: | Height: | Size: 238 KiB |
BIN
docs/imgs/gcp/ingress_final.png
Normal file
|
After Width: | Height: | Size: 57 KiB |
BIN
docs/imgs/gcp/ingress_ready.png
Normal file
|
After Width: | Height: | Size: 191 KiB |
BIN
docs/imgs/gcp/services_ingress.png
Normal file
|
After Width: | Height: | Size: 332 KiB |