feat(k8s): add GCP deploy recipe (#2768)

This commit is contained in:
Dexter Lee 2021-06-25 06:39:46 -07:00 committed by GitHub
parent 88a7f52cbc
commit 1bd22b43e2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
13 changed files with 138 additions and 3 deletions

View File

@ -118,6 +118,7 @@ module.exports = {
"datahub-kubernetes/README",
"docker/datahub-upgrade/README",
"docs/deploy/aws",
"docs/deploy/gcp",
// Purposely not including the following:
// - "docker/datahub-frontend/README",
// - "docker/datahub-gms-graphql-service/README",

View File

@ -193,7 +193,7 @@ Provision a MySQL database in AWS RDS that shares the VPC with the kubernetes cl
the VPC of the kubernetes cluster. Once the database is provisioned, you should be able to see the following page. Take
a note of the endpoint marked by the red box.
![AWS RDS](../imgs/aws-rds.png)
![AWS RDS](../imgs/aws/aws-rds.png)
First, add the DB password to kubernetes by running the following.
@ -226,7 +226,7 @@ Provision an elasticsearch domain running elasticsearch version 7.9 or above tha
cluster or has VPC peering set up between the VPC of the kubernetes cluster. Once the domain is provisioned, you should
be able to see the following page. Take a note of the endpoint marked by the red box.
![AWS Elasticsearch Service](../imgs/aws-elasticsearch.png)
![AWS Elasticsearch Service](../imgs/aws/aws-elasticsearch.png)
Update the elasticsearch settings under global in the quickstart-values.yaml as follows.
@ -255,7 +255,7 @@ Provision an MSK cluster that shares the VPC with the kubernetes cluster or has
the kubernetes cluster. Once the domain is provisioned, click on the “View client information” button in the Cluster
Summary” section. You should see a page like below. Take a note of the endpoints marked by the red boxes.
![AWS MSK](../imgs/aws-msk.png)
![AWS MSK](../imgs/aws/aws-msk.png)
Update the kafka settings under global in the quickstart-values.yaml as follows.

102
docs/deploy/gcp.md Normal file
View File

@ -0,0 +1,102 @@
---
title: "Deploying to GCP"
---
# GCP setup guide
The following is a set of instructions to quickstart DataHub on GCP Google Kubernetes Engine (GKE). Note, the guide
assumes that you do not have a kubernetes cluster set up. If you are deploying DataHub to an existing cluster, please
skip the corresponding sections.
## Prerequisites
This guide requires the following tools:
- [kubectl](https://kubernetes.io/docs/tasks/tools/) to manage kubernetes resources
- [helm](https://helm.sh/docs/intro/install/) to deploy the resources based on helm charts. Note, we only support Helm
3.
- [gcloud](https://cloud.google.com/sdk/docs/install) to manage GCP resources
Follow the
following [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster#before_you_begin) to
correctly set up Google Cloud SDK.
After setting up, run `gcloud services enable container.googleapis.com` to make sure GKE service is enabled.
## Start up a kubernetes cluster on GKE
Lets follow this [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster) to create a
new cluster using gcloud. Run the following command with cluster-name set to the cluster name of choice, and zone set to
the GCP zone you are operating on.
```
gcloud container clusters create <<cluster-name>> \
--zone <<zone>> \
-m e2-standard-2
```
The command will provision a GKE cluster powered by 3 e2-standard-2 (2 CPU, 8GB RAM) nodes.
If you are planning to run the storage layer (MySQL, Elasticsearch, Kafka) as pods in the cluster, you need at least 3
nodes with the above specs. If you decide to use managed storage services, you can reduce the number of nodes or use
m3.medium nodes to save cost. Refer to
this [guide](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-regional-cluster) for creating a regional
cluster for better robustness.
Run `kubectl get nodes` to confirm that the cluster has been setup correctly. You should get results like below
```
NAME STATUS ROLES AGE VERSION
gke-datahub-default-pool-e5be7c4f-8s97 Ready <none> 34h v1.19.10-gke.1600
gke-datahub-default-pool-e5be7c4f-d68l Ready <none> 34h v1.19.10-gke.1600
gke-datahub-default-pool-e5be7c4f-rksj Ready <none> 34h v1.19.10-gke.1600
```
## Setup DataHub using Helm
Once the kubernetes cluster has been set up, you can deploy DataHub and its prerequisites using helm. Please follow the
steps in this [guide](../../datahub-kubernetes/README.md)
## Expose endpoints using GKE ingress controller
Now that all the pods are up and running, you need to expose the datahub-frontend end point by setting
up [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/). Easiest way to set up ingress is to use
the GKE page on [GCP website](https://console.cloud.google.com/kubernetes/discovery).
Once all deploy is successful, you should see a page like below in the "Services & Ingress" tab on the left.
![Services and Ingress](../imgs/gcp/services_ingress.png)
Tick the checkbox for datahub-datahub-frontend and click "CREATE INGRESS" button. You should land on the following page.
![Ingress1](../imgs/gcp/ingress1.png)
Type in an arbitrary name for the ingress and click on the second step "Host and path rules". You should land on the
following page.
![Ingress2](../imgs/gcp/ingress2.png)
Select "datahub-datahub-frontend" in the dropdown menu for backends, and then click on "ADD HOST AND PATH RULE" button.
In the second row that got created, add in the host name of choice (here gcp.datahubproject.io) and select
"datahub-datahub-frontend" in the backends dropdown.
This step adds the rule allowing requests from the host name of choice to get routed to datahub-frontend service. Click
on step 3 "Frontend configuration". You should land on the following page.
![Ingress3](../imgs/gcp/ingress3.png)
Choose HTTPS in the dropdown menu for protocol. To enable SSL, you need to add a certificate. If you do not have one,
you can click "CREATE A NEW CERTIFICATE" and input the host name of choice. GCP will create a certificate for you.
Now press "CREATE" button on the left to create ingress! After around 5 minutes, you should see the following.
![Ingress Ready](../imgs/gcp/ingress_ready.png)
In your domain provider, add an A record for the host name set above using the IP address on the ingress page (noted
with the red box). Once DNS updates, you should be able to access DataHub through the host name!!
Note, ignore the warning icon next to ingress. It takes about ten minutes for ingress to check that the backend service
is ready and show a check mark as follows. However, ingress is fully functional once you see the above page.
![Ingress Final](../imgs/gcp/ingress_final.png)

View File

@ -71,6 +71,38 @@ the authenticated profile as the DataHub CorpUser identity.
> By default, the login callback endpoint exposed by DataHub will be located at `${AUTH_OIDC_BASE_URL}/callback/oidc`. This must **exactly** match the login redirect URL you've registered with your identity provider in step 1.
In kubernetes, you can add the above env variables in the values.yaml as follows.
```
datahub-frontend:
...
extraEnvs:
- name: AUTH_OIDC_ENABLED
value: true
- name: AUTH_OIDC_CLIENT_ID
value: your-client-id
- name: AUTH_OIDC_CLIENT_SECRET
value: your-client-secret
- name: AUTH_OIDC_DISCOVERY_URI
value: your-provider-discovery-url
- name: AUTH_OIDC_BASE_URL
value: your-datahub-url
```
You can also package OIDC client secrets into a k8s secret by running
```kubectl create secret generic datahub-oidc-secret --from-literal=secret=<<OIDC SECRET>>```
Then set the secret env as follows.
```
- name: AUTH_OIDC_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: datahub-oidc-secret
key: secret
```
#### Advanced
You can optionally customize the flow further using advanced configurations. These allow

View File

Before

Width:  |  Height:  |  Size: 149 KiB

After

Width:  |  Height:  |  Size: 149 KiB

View File

Before

Width:  |  Height:  |  Size: 172 KiB

After

Width:  |  Height:  |  Size: 172 KiB

View File

Before

Width:  |  Height:  |  Size: 341 KiB

After

Width:  |  Height:  |  Size: 341 KiB

BIN
docs/imgs/gcp/ingress1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

BIN
docs/imgs/gcp/ingress2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 259 KiB

BIN
docs/imgs/gcp/ingress3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 191 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 332 KiB