2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								---
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								title: "Deploying to AWS"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								---
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								# AWS setup guide
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								The following is a set of instructions to quickstart DataHub on AWS Elastic Kubernetes Service (EKS). Note, the guide
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								assumes that you do not have a kubernetes cluster set up. If you are deploying DataHub to an existing cluster, please
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								skip the corresponding sections.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								## Prerequisites
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								This guide requires the following tools:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								-  [kubectl ](https://kubernetes.io/docs/tasks/tools/ ) to manage kubernetes resources
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								-  [helm ](https://helm.sh/docs/intro/install/ ) to deploy the resources based on helm charts. Note, we only support Helm
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    3.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								-  [eksctl ](https://eksctl.io/introduction/#installation ) to create and manage clusters on EKS
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								-  [AWS CLI ](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html ) to manage AWS resources
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								To use the above tools, you need to set up AWS credentials by following
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								this [guide ](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								## Start up a kubernetes cluster on AWS EKS
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
										 
									
								 
							
							
								Let’  s follow this [guide ](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html ) to create a new
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								cluster using eksctl. Run the following command with cluster-name set to the cluster name of choice, and region set to
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								the AWS region you are operating on.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								eksctl create cluster \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    --name < < cluster-name > > \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    --region < < region > > \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    --with-oidc \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    --nodes=3
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								The command will provision an EKS cluster powered by 3 EC2 m3.large nodes and provision a VPC based networking layer.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								If you are planning to run the storage layer (MySQL, Elasticsearch, Kafka) as pods in the cluster, you need at least 3
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								nodes. If you decide to use managed storage services, you can reduce the number of nodes or use m3.medium nodes to save
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								cost. Refer to this [guide ](https://eksctl.io/usage/creating-and-managing-clusters/ ) to further customize the cluster
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								before provisioning.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Note, OIDC setup is required for following this guide when setting up the load balancer.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Run `kubectl get nodes`  to confirm that the cluster has been setup correctly. You should get results like below
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								NAME                                          STATUS   ROLES    AGE   VERSION
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								ip-192-168-49-49.us-west-2.compute.internal   Ready    < none >    3h    v1.18.9-eks-d1db3c
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								ip-192-168-64-56.us-west-2.compute.internal   Ready    < none >    3h    v1.18.9-eks-d1db3c
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								ip-192-168-8-126.us-west-2.compute.internal   Ready    < none >    3h    v1.18.9-eks-d1db3c
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								## Setup DataHub using Helm
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
										 
									
								 
							
							
								Once the kubernetes cluster has been set up, you can deploy DataHub and it’  s prerequisites using helm. Please follow the
							 
						 
					
						
							
								
									
										
										
										
											2021-07-08 19:27:19 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								steps in this [guide ](kubernetes.md )
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								## Expose endpoints using a load balancer
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Now that all the pods are up and running, you need to expose the datahub-frontend end point by setting
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								up [ingress ](https://kubernetes.io/docs/concepts/services-networking/ingress/ ). To do this, you need to first set up an
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								ingress controller. There are
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								many [ingress controllers ](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/ )  to choose
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								from, but here, we will follow
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								this [guide ](https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html ) to set up the AWS
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Application Load Balancer(ALB) Controller.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								First, if you did not use eksctl to setup the kubernetes cluster, make sure to go through the prerequisites listed
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								[here ](https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Download the IAM policy document for allowing the controller to make calls to AWS APIs on your behalf.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								curl -o iam_policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.2.0/docs/install/iam_policy.json
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Create an IAM policy based on the policy document by running the following.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								aws iam create-policy \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    --policy-name AWSLoadBalancerControllerIAMPolicy \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    --policy-document file://iam_policy.json
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Use eksctl to create a service account that allows us to attach the above policy to kubernetes pods.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								eksctl create iamserviceaccount \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --cluster=< < cluster-name > > \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --namespace=kube-system \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --name=aws-load-balancer-controller \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --attach-policy-arn=arn:aws:iam::< < account-id > >:policy/AWSLoadBalancerControllerIAMPolicy \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --override-existing-serviceaccounts \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --approve      
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Install the TargetGroupBinding custom resource definition by running the following.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Add the helm chart repository containing the latest version of the ALB controller.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								helm repo add eks https://aws.github.io/eks-charts
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								helm repo update
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Install the controller into the kubernetes cluster by running the following.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --set clusterName=< < cluster-name > > \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --set serviceAccount.create=false \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  --set serviceAccount.name=aws-load-balancer-controller \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  -n kube-system
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Verify the install completed by running `kubectl get deployment -n kube-system aws-load-balancer-controller` . It should
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								return a result like the following.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								aws-load-balancer-controller   2/2     2            2           142m
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								Now that the controller has been set up, we can enable ingress by updating the values.yaml (or any other values.yaml
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								file used to deploy datahub). Change datahub-frontend values to the following.
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								datahub-frontend:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  enabled: true
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  image:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    repository: linkedin/datahub-frontend-react
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    tag: "latest"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  ingress:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    enabled: true
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    annotations:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      kubernetes.io/ingress.class: alb
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      alb.ingress.kubernetes.io/scheme: internet-facing
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      alb.ingress.kubernetes.io/target-type: instance
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      alb.ingress.kubernetes.io/certificate-arn: < < certificate-arn > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    hosts:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      -  host: < < host-name > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        redirectPaths:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								          -  path: /*
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            name: ssl-redirect
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            port: use-annotation
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        paths:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								          -  /*
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								You need to request a certificate in the AWS Certificate Manager by following this
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								[guide ](https://docs.aws.amazon.com/acm/latest/userguide/gs-acm-request-public.html ), and replace certificate-arn with
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								the ARN of the new certificate. You also need to replace host-name with the hostname of choice like
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								demo.datahubproject.io.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								After updating the yaml file, run the following to apply the updates.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								helm upgrade --install datahub datahub/datahub --values values.yaml
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Once the upgrade completes, run `kubectl get ingress`  to verify the ingress setup. You should see a result like the
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								following.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								NAME                       CLASS    HOSTS                         ADDRESS                                                                 PORTS   AGE
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								datahub-datahub-frontend   < none >    demo.datahubproject.io   k8s-default-datahubd-80b034d83e-904097062.us-west-2.elb.amazonaws.com   80      3h5m
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Note down the elb address in the address column. Add the DNS CNAME record to the host domain pointing the host-name (
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								from above) to the elb address. DNS updates generally take a few minutes to an hour. Once that is done, you should be
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								able to access datahub-frontend through the host-name.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								## Use AWS managed services for the storage layer
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Managing the storage services like MySQL, Elasticsearch, and Kafka as kubernetes pods requires a great deal of
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								maintenance workload. To reduce the workload, you can use managed services like AWS [RDS ](https://aws.amazon.com/rds ),
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								[Elasticsearch Service ](https://aws.amazon.com/elasticsearch-service/ ), and [Managed Kafka ](https://aws.amazon.com/msk/ )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								as the storage layer for DataHub. Support for using AWS Neptune as graph DB is coming soon.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								### RDS
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Provision a MySQL database in AWS RDS that shares the VPC with the kubernetes cluster or has VPC peering set up between
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								the VPC of the kubernetes cluster. Once the database is provisioned, you should be able to see the following page. Take
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								a note of the endpoint marked by the red box.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-25 06:39:46 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								First, add the DB password to kubernetes by running the following.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								kubectl delete secret mysql-secrets
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								kubectl create secret generic mysql-secrets --from-literal=mysql-root-password=< < password > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								Update the sql settings under global in the values.yaml as follows.
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  sql:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    datasource:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      host: "< < rds-endpoint > >:3306"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      hostForMysqlClient: "< < rds-endpoint > >"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      port: "3306"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      url: "jdbc:mysql://< < rds-endpoint > >:3306/datahub?verifyServerCertificate=false& useSSL=true& useUnicode=yes& characterEncoding=UTF-8"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      driver: "com.mysql.jdbc.Driver"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      username: "root"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      password:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        secretRef: mysql-secrets
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        secretKey: mysql-root-password
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								Run `helm upgrade --install datahub datahub/datahub --values values.yaml`  to apply the changes.
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								### Elasticsearch Service
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Provision an elasticsearch domain running elasticsearch version 7.9 or above that shares the VPC with the kubernetes
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								cluster or has VPC peering set up between the VPC of the kubernetes cluster. Once the domain is provisioned, you should
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								be able to see the following page. Take a note of the endpoint marked by the red box.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-25 06:39:46 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								Update the elasticsearch settings under global in the values.yaml as follows.
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  elasticsearch:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    host: < < elasticsearch-endpoint > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    port: "443"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    useSSL: "true"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								You can also allow communication via HTTP (without SSL) by using the settings below.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  elasticsearch:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    host: < < elasticsearch-endpoint > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    port: "80"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-10-11 20:59:35 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								If you have fine-grained access control enabled with basic authentication, first run the following to create a k8s
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								secret with the password.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								kubectl delete secret elasticsearch-secrets
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								kubectl create secret generic elasticsearch-secrets --from-literal=elasticsearch-password=< < password > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Then use the settings below.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  elasticsearch:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    host: < < elasticsearch-endpoint > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    port: "443"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    useSSL: "true"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    auth:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      username: < < username > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      password:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        secretRef: elasticsearch-secrets
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        secretName: elasticsearch-password
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Lastly, you **NEED**  to set the following env variable for **elasticsearchSetupJob** . AWS Elasticsearch/Opensearch
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								service uses OpenDistro version of Elasticsearch, which does not support the "datastream" functionality. As such, we use
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								a different way of creating time based indices.
							 
						 
					
						
							
								
									
										
										
										
											2021-07-28 13:55:15 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								  elasticsearchSetupJob:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    enabled: true
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    image:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      repository: linkedin/datahub-elasticsearch-setup
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      tag: "***"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    extraEnvs:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      -  name: USE_AWS_ELASTICSEARCH
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        value: "true"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								Run `helm upgrade --install datahub datahub/datahub --values values.yaml`  to apply the changes.
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-11-16 14:09:40 -08:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								**Note:**
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								If you have a custom setup of elastic search cluster and are deploying through docker, you can modify the configurations in datahub to point to the specific ES instance -
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								1.  If you are using `docker quickstart`  you can modify the hostname and port of the ES instance in docker compose quickstart files located [here ](../../docker/quickstart/ ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								   1.  Once you have modified the quickstart recipes you can run the quickstart command using a specific docker compose file. Sample command for that is - `datahub docker quickstart --quickstart-compose-file docker/quickstart/docker-compose-without-neo4j.quickstart.yml` 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								2.  If you are not using quickstart recipes, you can modify environment variable in GMS to point to the ES instance. The env files for datahub-gms are located [here ](../../docker/datahub-gms/env/ ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Further, you can find a list of properties supported to work with a custom ES instance [here ](../../metadata-service/factories/src/main/java/com/linkedin/gms/factory/common/ElasticsearchSSLContextFactory.java ) and [here ](../../metadata-service/factories/src/main/java/com/linkedin/gms/factory/common/RestHighLevelClientFactory.java ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								A mapping between the property name used in the above two files and the name used in docker/env file can be found [here ](../../metadata-service/factories/src/main/resources/application.yml ).
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								### Managed Streaming for Apache Kafka (MSK)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Provision an MSK cluster that shares the VPC with the kubernetes cluster or has VPC peering set up between the VPC of
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
										 
									
								 
							
							
								the kubernetes cluster. Once the domain is provisioned, click on the “View client information” button in the ‘  Cluster
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Summary” section. You should see a page like below. Take a note of the endpoints marked by the red boxes.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-25 06:39:46 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								Update the kafka settings under global in the values.yaml as follows.
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								kafka:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    bootstrap:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      server: "< < bootstrap-server  endpoint > >"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    zookeeper:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      server:  "< < zookeeper  endpoint > >"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    schemaregistry:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      url: "http://prerequisites-cp-schema-registry:8081"
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								    partitions: 3
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    replicationFactor: 3
							 
						 
					
						
							
								
									
										
										
										
											2021-05-26 21:37:37 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								Note, the number of partitions and replicationFactor should match the number of bootstrap servers. This is by default 3
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								for AWS MSK.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Run `helm upgrade --install datahub datahub/datahub --values values.yaml`  to apply the changes.
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 21:40:37 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								### AWS Glue Schema Registry
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								You can use AWS Glue schema registry instead of the kafka schema registry. To do so, first provision an AWS Glue schema
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								registry in the "Schema Registry" tab in the AWS Glue console page.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Once the registry is provisioned, you can change helm chart as follows.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								kafka:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    bootstrap:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      ...
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    zookeeper:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      ...
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    schemaregistry:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      type: AWS_GLUE
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								      glue:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        region: < < AWS  region  of  registry > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        registry: < < name  of  registry > >
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Note, it will use the name of the topic as the schema name in the registry.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Before you update the pods, you need to give the k8s worker nodes the correct permissions to access the schema registry.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								The minimum permissions required looks like this
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								{
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    "Version": "2012-10-17",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    "Statement": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            "Sid": "VisualEditor0",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            "Effect": "Allow",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            "Action": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:GetRegistry",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:ListRegistries",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:CreateSchema",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:UpdateSchema",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:GetSchema",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:ListSchemas",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:RegisterSchemaVersion",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:GetSchemaByDefinition",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:GetSchemaVersion",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:GetSchemaVersionsDiff",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:ListSchemaVersions",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:CheckSchemaVersionValidity",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:PutSchemaVersionMetadata",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:QuerySchemaVersionMetadata"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            "Resource": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "arn:aws:glue:*:795586375822:schema/*",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "arn:aws:glue:us-west-2:795586375822:registry/demo-shared"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            "Sid": "VisualEditor1",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            "Effect": "Allow",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            "Action": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "glue:GetSchemaVersion"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            ],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            "Resource": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								                "*"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								            ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								        }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								    ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								The latter part is required to have "*" as the resource because of an issue in the AWS Glue schema registry library.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Refer to [this issue ](https://github.com/awslabs/aws-glue-schema-registry/issues/68 ) for any updates.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Glue currently doesn't support AWS Signature V4. As such, we cannot use service accounts to give permissions to access
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								the schema registry. The workaround is to give the above permission to the EKS worker node's IAM role. Refer
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								to [this issue ](https://github.com/awslabs/aws-glue-schema-registry/issues/69 ) for any updates.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-01 12:28:02 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								Run `helm upgrade --install datahub datahub/datahub --values values.yaml`  to apply the changes.
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 21:40:37 -07:00 
										
									 
								 
							 
							
								
									
										 
									 
								
							 
							
								 
							 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								Note, you will be seeing log "Schema Version Id is null. Trying to register the schema" on every request. This log is
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								misleading, so should be ignored. Schemas are cached, so it does not register a new version on every request (aka no
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								performance issues). This has been fixed by [this PR ](https://github.com/awslabs/aws-glue-schema-registry/pull/64 ) but
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							 
							
								
									
								 
							
							
								the code has not been released yet. We will update version once a new release is out.