mirror of
				https://github.com/open-metadata/OpenMetadata.git
				synced 2025-10-24 15:25:10 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			421 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			421 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Kubernetes GKE Deployment
 | |
| slug: /deployment/kubernetes/gke
 | |
| ---
 | |
| 
 | |
| # GKE on Google Cloud Platform Deployment
 | |
| 
 | |
| OpenMetadata supports the Installation and Running of Application on Google Kubernetes Engine through Helm Charts.
 | |
| However, there are some additional configurations which needs to be done as prerequisites for the same.
 | |
| 
 | |
| {%note%}
 | |
| 
 | |
| Google Kubernetes Engine (GKE) Auto Pilot Mode is not compatible with one of OpenMetadata Dependencies - ElasticSearch.
 | |
| The reason being that ElasticSearch Pods require Elevated permissions to run initContainers for changing configurations which is not allowed by GKE AutoPilot PodSecurityPolicy.
 | |
| 
 | |
| {%/note%}
 | |
| 
 | |
| {%note noteType="Warning"%}
 | |
| 
 | |
| All the code snippets in this section assume the `default` namespace for kubernetes.
 | |
| 
 | |
| {%/note%}
 | |
| 
 | |
| ## Prerequisites
 | |
| 
 | |
| ### Cloud Database with CloudSQL and ElasticCloud for GCP as Search Engine
 | |
| 
 | |
| It is recommended to use GCP [Cloud SQL](https://cloud.google.com/sql/) services for Database and [Elastic Cloud GCP](https://www.elastic.co/partners/google-cloud) for Search Engine for Production.
 | |
| 
 | |
| We support -
 | |
| - Cloud SQL (MySQL) engine version 8 or higher
 | |
| - Cloud SQL (postgreSQL) engine version 12 or higher
 | |
| - ElasticSearch Engine version 8.X (upto 8.10.X)
 | |
| 
 | |
| We recommend -
 | |
| - CloudSQL to be Multi Zone Available
 | |
| - Elastic Cloud Environment with multiple zones and minimum 2 nodes
 | |
| 
 | |
| Once you have the Database and Search Engine configured and available, update the helm values below for OpenMetadata kubernetes deployments to connect with Database and ElasticSearch.
 | |
| 
 | |
| ```yaml
 | |
| # openmetadata-values.prod.yaml
 | |
| ...
 | |
| openmetadata:
 | |
|   config:
 | |
|     elasticsearch:
 | |
|       host: <ELASTIC_CLOUD_SERVICE_ENDPOINT_WITHOUT_HTTPS>
 | |
|       searchType: elasticsearch
 | |
|       port: 443
 | |
|       scheme: https
 | |
|       connectionTimeoutSecs: 5
 | |
|       socketTimeoutSecs: 60
 | |
|       keepAliveTimeoutSecs: 600
 | |
|       batchSize: 10
 | |
|       auth:
 | |
|         enabled: true
 | |
|         username: <ELASTIC_CLOUD_USERNAME>
 | |
|         password:
 | |
|           secretRef: elasticsearch-secrets
 | |
|           secretKey: openmetadata-elasticsearch-password
 | |
|     database:
 | |
|       host: <GCP_CLOUD_SQL_ENDPOINT_IP>
 | |
|       port: 3306
 | |
|       driverClass: com.mysql.cj.jdbc.Driver
 | |
|       dbScheme: mysql 
 | |
|       dbUseSSL: true
 | |
|       databaseName: <GCP_CLOUD_SQL_DATABASE_NAME>
 | |
|       auth:
 | |
|         username: <GCP_CLOUD_SQL_DATABASE_USERNAME>
 | |
|         password:
 | |
|           secretRef: mysql-secrets
 | |
|           secretKey: openmetadata-mysql-password
 | |
|   ...
 | |
| ```
 | |
| 
 | |
| {%note noteType="Tip"%}
 | |
| 
 | |
| For Database as PostgreSQL, the use the below config for database values -
 | |
| 
 | |
| ```yaml
 | |
| # openmetadata-values.prod.yaml
 | |
| ...
 | |
| openmetadata:
 | |
|   config:
 | |
|     ...
 | |
|     database:
 | |
|       host: <GCP_CLOUD_SQL_ENDPOINT_IP>
 | |
|       port: 5432
 | |
|       driverClass: org.postgresql.Driver
 | |
|       dbScheme: postgresql 
 | |
|       dbUseSSL: true
 | |
|       databaseName: <GCP_CLOUD_SQL_DATABASE_NAME>
 | |
|       auth:
 | |
|         username: <GCP_CLOUD_SQL_DATABASE_USERNAME>
 | |
|         password:
 | |
|           secretRef: sql-secrets
 | |
|           secretKey: openmetadata-sql-password
 | |
| ```
 | |
| 
 | |
| {%/note%}
 | |
| 
 | |
| {%note noteType="Tip"%}
 | |
| 
 | |
| Make sure to create CloudSQL and ElasticSearch credentials as Kubernetes Secrets mentioned [here](/quick-start/local-kubernetes-deployment#2.-create-kubernetes-secrets-required-for-helm-charts).
 | |
| 
 | |
| Also, disable MySQL and ElasticSearch from OpenMetadata Dependencies Helm Charts as mentioned in the FAQs [here](#how-to-disable-mysql-and-elasticsearch-from-openmetadata-dependencies-helm-charts).
 | |
| 
 | |
| {%/note%}
 | |
| 
 | |
| ### Persistent Volumes with ReadWriteMany Access Modes
 | |
| 
 | |
| OpenMetadata helm chart depends on Airflow and Airflow expects a persistent disk that support ReadWriteMany (the volume can be mounted as read-write by many nodes).
 | |
| 
 | |
| The workaround is to create nfs-server disk on Google Kubernetes Engine and use that as the persistent claim and deploy OpenMetadata by implementing the following steps in order.
 | |
| 
 | |
| ## Create NFS Share
 | |
| 
 | |
| ### Provision GCP Persistent Disk for Google Kubernetes Engine
 | |
| 
 | |
| Run the below command to create a gcloud compute zonal disk. For more information on Google Cloud Disk Options, please visit [here](https://cloud.google.com/compute/docs/disks).
 | |
| 
 | |
| ```commandline
 | |
| gcloud compute disks create --size=100GB --zone=<zone_id> nfs-disk
 | |
| ```
 | |
| 
 | |
| ### Deploy NFS Server in GKE
 | |
| 
 | |
| ### Code Samples
 | |
| 
 | |
| ```yaml
 | |
| # nfs-server-deployment.yml
 | |
| apiVersion: apps/v1
 | |
| kind: Deployment
 | |
| metadata:
 | |
|   name: nfs-server
 | |
| spec:
 | |
|   replicas: 1
 | |
|   selector:
 | |
|     matchLabels:
 | |
|       role: nfs-server
 | |
|   template:
 | |
|     metadata:
 | |
|       labels:
 | |
|         role: nfs-server
 | |
|     spec:
 | |
|       initContainers:
 | |
|       - name: init-airflow-directories
 | |
|         image: busybox
 | |
|         command: ['sh', '-c', 'mkdir -p /exports/airflow-dags /exports/airflow-logs']
 | |
|         volumeMounts:
 | |
|           - mountPath: /exports
 | |
|             name: nfs-pvc
 | |
|       containers:
 | |
|       - name: nfs-server
 | |
|         image: gcr.io/google_containers/volume-nfs:0.8
 | |
|         ports:
 | |
|           - name: nfs
 | |
|             containerPort: 2049
 | |
|           - name: mountd
 | |
|             containerPort: 20048
 | |
|           - name: rpcbind
 | |
|             containerPort: 111
 | |
|         securityContext:
 | |
|           privileged: true
 | |
|         volumeMounts:
 | |
|           - mountPath: /exports
 | |
|             name: nfs-pvc
 | |
|       volumes:
 | |
|         - name: nfs-pvc
 | |
|           gcePersistentDisk:
 | |
|             pdName: nfs-disk
 | |
|             fsType: ext4
 | |
| ---
 | |
| # nfs-cluster-ip-service.yml
 | |
| apiVersion: v1
 | |
| kind: Service
 | |
| metadata:
 | |
|   name: nfs-server
 | |
| spec:
 | |
|   ports:
 | |
|     - name: nfs
 | |
|       port: 2049
 | |
|     - name: mountd
 | |
|       port: 20048
 | |
|     - name: rpcbind
 | |
|       port: 111
 | |
|   selector:
 | |
|     role: nfs-server
 | |
| ```
 | |
| Run the commands below and ensure the pods are running.
 | |
| 
 | |
| ```commandline
 | |
| kubectl create -f nfs-server-deployment.yml
 | |
| kubectl create -f nfs-cluster-ip-service.yml
 | |
| ```
 | |
| 
 | |
| We create a ClusterIP Service for pods to access NFS within the cluster at a fixed IP/DNS.
 | |
| 
 | |
| ### Provision NFS backed PV and PVC for Airflow DAGs and Airflow Logs
 | |
| 
 | |
| Update `<NFS_SERVER_CLUSTER_IP>` with the NFS Service Cluster IP Address for below code snippets.
 | |
| You can get the clusterIP using the following command
 | |
| 
 | |
| ```commandline
 | |
| kubectl get service nfs-server -o jsonpath='{.spec.clusterIP}'
 | |
| ```
 | |
| 
 | |
| ### Code Samples for PV and PVC for Airflow DAGs
 | |
| 
 | |
| ```yaml
 | |
| # dags_pv_pvc.yml
 | |
| apiVersion: v1
 | |
| kind: PersistentVolume
 | |
| metadata:
 | |
|   name: openmetadata-dependencies-dags-pv
 | |
| spec:
 | |
|   capacity:
 | |
|     storage: 10Gi
 | |
|   accessModes:
 | |
|     - ReadWriteMany
 | |
|   nfs:
 | |
|     server: <NFS_SERVER_CLUSTER_IP>
 | |
|     path: "/airflow-dags"
 | |
| 
 | |
| ---
 | |
| apiVersion: v1
 | |
| kind: PersistentVolumeClaim
 | |
| metadata:
 | |
|   labels:
 | |
|     app: airflow
 | |
|     release: openmetadata-dependencies
 | |
|   name: openmetadata-dependencies-dags
 | |
|   namespace: default
 | |
| spec:
 | |
|   accessModes:
 | |
|   - ReadWriteMany
 | |
|   resources:
 | |
|     requests:
 | |
|       storage: 10Gi
 | |
|   storageClassName: ""
 | |
| ```
 | |
| 
 | |
| Create Persistent Volumes and Persistent Volume claims with the below command.
 | |
| 
 | |
| ```commandline
 | |
| kubectl create -f dags_pv_pvc.yml
 | |
| ```
 | |
| 
 | |
| ### Code Samples for PV and PVC for Airflow Logs
 | |
| 
 | |
| ```yaml
 | |
| # logs_pv_pvc.yml
 | |
| apiVersion: v1
 | |
| kind: PersistentVolume
 | |
| metadata:
 | |
|   name: openmetadata-dependencies-logs-pv
 | |
| spec:
 | |
|   capacity:
 | |
|     storage: 10Gi
 | |
|   accessModes:
 | |
|     - ReadWriteMany
 | |
|   nfs:
 | |
|     server: <NFS_SERVER_CLUSTER_IP>
 | |
|     path: "/airflow-logs"
 | |
| 
 | |
| ---
 | |
| apiVersion: v1
 | |
| kind: PersistentVolumeClaim
 | |
| metadata:
 | |
|   labels:
 | |
|     app: airflow
 | |
|   name: openmetadata-dependencies-logs
 | |
|   namespace: default
 | |
| spec:
 | |
|   accessModes:
 | |
|   - ReadWriteMany
 | |
|   resources:
 | |
|     requests:
 | |
|       storage: 10Gi
 | |
|   storageClassName: ""
 | |
| ```
 | |
| 
 | |
| Create Persistent Volumes and Persistent Volume claims with the below command.
 | |
| 
 | |
| ```commandline
 | |
| kubectl create -f logs_pv_pvc.yml
 | |
| ```
 | |
| 
 | |
| ## Change owner and permission manually on disks
 | |
| 
 | |
| Since airflow pods run as non root users, they would not have write access on the nfs server volumes. In order to fix the permission here, spin up a pod with persistent volumes attached and run it once.
 | |
| 
 | |
| ```yaml
 | |
| # permissions_pod.yml
 | |
| apiVersion: v1
 | |
| kind: Pod
 | |
| metadata:
 | |
|   creationTimestamp: null
 | |
|   labels:
 | |
|     run: my-permission-pod
 | |
|   name: my-permission-pod
 | |
| spec:
 | |
|   containers:
 | |
|   - image: nginx
 | |
|     name: my-permission-pod
 | |
|     volumeMounts:
 | |
|     - name: airflow-dags
 | |
|       mountPath: /airflow-dags
 | |
|     - name: airflow-logs
 | |
|       mountPath: /airflow-logs
 | |
|   volumes:
 | |
|   - name: airflow-logs
 | |
|     persistentVolumeClaim:
 | |
|       claimName: openmetadata-dependencies-logs
 | |
|   - name: airflow-dags
 | |
|     persistentVolumeClaim:
 | |
|       claimName: openmetadata-dependencies-dags
 | |
|   dnsPolicy: ClusterFirst
 | |
|   restartPolicy: Always
 | |
| ```
 | |
| 
 | |
| {%note%}
 | |
| 
 | |
| Airflow runs the pods with linux user name as airflow and linux user id as 50000.
 | |
| 
 | |
| {%/note%}
 | |
| 
 | |
| Run the below command to create the pod and fix the permissions
 | |
| 
 | |
| ```commandline
 | |
| kubectl create -f permissions_pod.yml
 | |
| ```
 | |
| 
 | |
| Once the permissions pod is up and running, execute the below commands within the container.
 | |
| 
 | |
| ```commandline
 | |
| kubectl exec --tty my-permission-pod --container my-permission-pod -- chown -R 50000 /airflow-dags /airflow-logs
 | |
| # If needed
 | |
| kubectl exec --tty my-permission-pod --container my-permission-pod -- chmod -R a+rwx /airflow-dags
 | |
| ```
 | |
| 
 | |
| ## Create OpenMetadata dependencies Values
 | |
| 
 | |
| Override openmetadata dependencies airflow helm values to bind the nfs persistent volumes for DAGs and logs.
 | |
| 
 | |
| ```yaml
 | |
| # values-dependencies.yml
 | |
| airflow:
 | |
|   airflow:
 | |
|     extraVolumeMounts:
 | |
|       - mountPath: /airflow-logs
 | |
|         name: nfs-airflow-logs
 | |
|       - mountPath: /airflow-dags/dags
 | |
|         name: nfs-airflow-dags
 | |
|     extraVolumes:
 | |
|       - name: nfs-airflow-logs
 | |
|         persistentVolumeClaim:
 | |
|           claimName: openmetadata-dependencies-logs
 | |
|       - name: nfs-airflow-dags
 | |
|         persistentVolumeClaim:
 | |
|           claimName: openmetadata-dependencies-dags
 | |
|     config:
 | |
|       AIRFLOW__OPENMETADATA_AIRFLOW_APIS__DAG_GENERATED_CONFIGS: "/airflow-dags/dags"
 | |
|   dags:
 | |
|     path: /airflow-dags/dags
 | |
|     persistence:
 | |
|       enabled: false
 | |
|   logs:
 | |
|     path: /airflow-logs
 | |
|     persistence:
 | |
|       enabled: false
 | |
| ```
 | |
| {%note%}
 | |
| 
 | |
| For more information on airflow helm chart values, please refer to [airflow-helm](https://artifacthub.io/packages/helm/airflow-helm/airflow/8.8.0).
 | |
| 
 | |
| When deploying openmeteadata dependencies helm chart, use the below command -
 | |
| 
 | |
| ```commandline
 | |
| helm install openmetadata-dependencies open-metadata/openmetadata-dependencies --values values-dependencies.yaml
 | |
| ```
 | |
| 
 | |
| {%/note%}
 | |
| 
 | |
| The above command uses configurations defined [here](https://raw.githubusercontent.com/open-metadata/openmetadata-helm-charts/main/charts/deps/values.yaml). 
 | |
| You can modify any configuration and deploy by passing your own `values.yaml`
 | |
| 
 | |
| ```commandline
 | |
| helm install openmetadata-dependencies open-metadata/openmetadata-dependencies --values <path-to-values-file>
 | |
| ```
 | |
| 
 | |
| Once the openmetadata dependencies helm chart deployed, you can then run the below command to install the openmetadata helm chart - 
 | |
| 
 | |
| ```commandline
 | |
| helm install openmetadata open-metadata/openmetadata --values <path-to-values-file>
 | |
| ```
 | |
| 
 | |
| # Troubleshooting
 | |
| 
 | |
| ## Pods are stuck in Pending State due to Persistent Volume Creation Failure
 | |
| 
 | |
| If you came across `invalid access type while creating the pvc`, and the permission pod is stuck in "pending" state.
 | |
| 
 | |
| The above error might have occurred due to the pvc volumes not setup or pvc volumes are not mounted properly.
 | |
| 
 | |
| {% image
 | |
|   src="/images/v1.4/deployment/troubleshoot/dag-log.png"
 | |
|   alt="dag-log" /%}
 | |
| {% image
 | |
|   src="/images/v1.4/deployment/troubleshoot/permission-pod-events.png"
 | |
|   alt="permission-pod-events"
 | |
|   caption="Permission pod events" /%}
 | |
| 
 | |
| Please validate:
 | |
| - all the prerequisites mentioned in this [section](#prerequisites)
 | |
| - the configuration of `dags_pv_pvc.yml` file
 | |
| - `storageClassName` field in YAML file
 | |
| 
 | |
| # FAQs
 | |
| 
 | |
| {% partial file="/v1.4/deployment/faqs.md" /%} | 
