mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-31 18:59:23 +00:00 
			
		
		
		
	
		
			
	
	
		
			256 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			256 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | # External OAuth Authentication
 | ||
|  | 
 | ||
|  | DataHub supports authenticating API requests using JWT tokens from external identity providers like Okta, Azure AD, Google Identity, and others. This is perfect for service-to-service authentication where your applications need to call DataHub APIs. | ||
|  | 
 | ||
|  | ## Overview
 | ||
|  | 
 | ||
|  | When you configure OAuth authentication, DataHub will: | ||
|  | 
 | ||
|  | 1. Accept JWT tokens from your trusted identity provider | ||
|  | 2. Validate the token signature and claims | ||
|  | 3. Automatically create service accounts for authenticated users | ||
|  | 4. Grant API access based on DataHub's permission system | ||
|  | 
 | ||
|  | ## Configuration
 | ||
|  | 
 | ||
|  | Configure OAuth authentication by setting these environment variables in your DataHub deployment: | ||
|  | 
 | ||
|  | Set these environment variables for the `datahub-gms` service: | ||
|  | 
 | ||
|  | ```bash | ||
|  | # Enable OAuth authentication
 | ||
|  | EXTERNAL_OAUTH_ENABLED=true | ||
|  | 
 | ||
|  | # Required: Trusted JWT issuers (comma-separated)
 | ||
|  | EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://auth.example.com,https://okta.company.com | ||
|  | 
 | ||
|  | # Required: Allowed JWT audiences (comma-separated)
 | ||
|  | EXTERNAL_OAUTH_ALLOWED_AUDIENCES=datahub-api,my-service-id | ||
|  | 
 | ||
|  | # Required: JWKS endpoint for signature verification
 | ||
|  | EXTERNAL_OAUTH_JWKS_URI=https://auth.example.com/.well-known/jwks.json | ||
|  | 
 | ||
|  | # Optional: JWT claim containing user ID (default: "sub")
 | ||
|  | EXTERNAL_OAUTH_USER_ID_CLAIM=sub | ||
|  | 
 | ||
|  | # Optional: Signing algorithm (default: "RS256")
 | ||
|  | EXTERNAL_OAUTH_ALGORITHM=RS256 | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Docker Compose Example
 | ||
|  | 
 | ||
|  | ```yaml | ||
|  | version: "3.8" | ||
|  | services: | ||
|  |   datahub-gms: | ||
|  |     image: acryldata/datahub-gms:latest | ||
|  |     environment: | ||
|  |       # External OAuth Configuration | ||
|  |       - EXTERNAL_OAUTH_ENABLED=true | ||
|  |       - EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://my-okta-domain.okta.com/oauth2/default | ||
|  |       - EXTERNAL_OAUTH_ALLOWED_AUDIENCES=0oa1234567890abcdef | ||
|  |       - EXTERNAL_OAUTH_JWKS_URI=https://my-okta-domain.okta.com/oauth2/default/v1/keys | ||
|  |       - EXTERNAL_OAUTH_USER_ID_CLAIM=sub | ||
|  |       - EXTERNAL_OAUTH_ALGORITHM=RS256 | ||
|  | 
 | ||
|  |       # Standard DataHub settings | ||
|  |       - DATAHUB_GMS_HOST=0.0.0.0 | ||
|  |       - DATAHUB_GMS_PORT=8080 | ||
|  |       # ... other configurations | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Kubernetes Example
 | ||
|  | 
 | ||
|  | ```yaml | ||
|  | apiVersion: apps/v1 | ||
|  | kind: Deployment | ||
|  | metadata: | ||
|  |   name: datahub-gms | ||
|  | spec: | ||
|  |   template: | ||
|  |     spec: | ||
|  |       containers: | ||
|  |         - name: datahub-gms | ||
|  |           image: acryldata/datahub-gms:latest | ||
|  |           env: | ||
|  |             - name: EXTERNAL_OAUTH_ENABLED | ||
|  |               value: "true" | ||
|  |             - name: EXTERNAL_OAUTH_TRUSTED_ISSUERS | ||
|  |               value: "https://login.microsoftonline.com/tenant-id/v2.0" | ||
|  |             - name: EXTERNAL_OAUTH_ALLOWED_AUDIENCES | ||
|  |               value: "api://datahub-prod" | ||
|  |             - name: EXTERNAL_OAUTH_JWKS_URI | ||
|  |               value: "https://login.microsoftonline.com/tenant-id/discovery/v2.0/keys" | ||
|  |           # ... other environment variables | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Multiple Providers
 | ||
|  | 
 | ||
|  | To support multiple OAuth providers, use comma-separated values: | ||
|  | 
 | ||
|  | ```bash | ||
|  | # Multiple issuers and audiences
 | ||
|  | EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://okta.company.com,https://auth0.company.com | ||
|  | EXTERNAL_OAUTH_ALLOWED_AUDIENCES=datahub-prod,datahub-staging,service-account-id | ||
|  | 
 | ||
|  | # Single JWKS URI (if providers share keys) or discovery URI
 | ||
|  | EXTERNAL_OAUTH_JWKS_URI=https://okta.company.com/.well-known/jwks.json | ||
|  | 
 | ||
|  | # Or use discovery URI to auto-derive JWKS
 | ||
|  | EXTERNAL_OAUTH_DISCOVERY_URI=https://okta.company.com/.well-known/openid-configuration | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Discovery URI vs JWKS URI
 | ||
|  | 
 | ||
|  | You can specify either: | ||
|  | 
 | ||
|  | - **JWKS URI**: Direct endpoint to signing keys (recommended for production) | ||
|  | - **Discovery URI**: OIDC discovery document URL (DataHub will auto-derive JWKS URI) | ||
|  | 
 | ||
|  | ```bash | ||
|  | # Option 1: Direct JWKS URI (faster, more reliable)
 | ||
|  | EXTERNAL_OAUTH_JWKS_URI=https://auth.example.com/.well-known/jwks.json | ||
|  | 
 | ||
|  | # Option 2: Discovery URI (convenient, auto-derives JWKS)
 | ||
|  | EXTERNAL_OAUTH_DISCOVERY_URI=https://auth.example.com/.well-known/openid-configuration | ||
|  | ``` | ||
|  | 
 | ||
|  | ## Provider Examples
 | ||
|  | 
 | ||
|  | ### Okta
 | ||
|  | 
 | ||
|  | ```bash | ||
|  | EXTERNAL_OAUTH_ENABLED=true | ||
|  | EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://your-domain.okta.com/oauth2/default | ||
|  | EXTERNAL_OAUTH_ALLOWED_AUDIENCES=0oa1234567890abcdef | ||
|  | EXTERNAL_OAUTH_JWKS_URI=https://your-domain.okta.com/oauth2/default/v1/keys | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Auth0
 | ||
|  | 
 | ||
|  | ```bash | ||
|  | EXTERNAL_OAUTH_ENABLED=true | ||
|  | EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://your-domain.auth0.com/ | ||
|  | EXTERNAL_OAUTH_ALLOWED_AUDIENCES=https://your-api-identifier/ | ||
|  | EXTERNAL_OAUTH_JWKS_URI=https://your-domain.auth0.com/.well-known/jwks.json | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Azure AD / Microsoft Entra
 | ||
|  | 
 | ||
|  | ```bash | ||
|  | EXTERNAL_OAUTH_ENABLED=true | ||
|  | EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://login.microsoftonline.com/your-tenant-id/v2.0 | ||
|  | EXTERNAL_OAUTH_ALLOWED_AUDIENCES=api://your-app-id | ||
|  | EXTERNAL_OAUTH_JWKS_URI=https://login.microsoftonline.com/your-tenant-id/discovery/v2.0/keys | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Google Cloud Identity
 | ||
|  | 
 | ||
|  | ```bash | ||
|  | EXTERNAL_OAUTH_ENABLED=true | ||
|  | EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://accounts.google.com | ||
|  | EXTERNAL_OAUTH_ALLOWED_AUDIENCES=your-client-id.apps.googleusercontent.com | ||
|  | EXTERNAL_OAUTH_JWKS_URI=https://www.googleapis.com/oauth2/v3/certs | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Keycloak
 | ||
|  | 
 | ||
|  | ```bash | ||
|  | EXTERNAL_OAUTH_ENABLED=true | ||
|  | EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://keycloak.company.com/realms/datahub | ||
|  | EXTERNAL_OAUTH_ALLOWED_AUDIENCES=datahub-client | ||
|  | EXTERNAL_OAUTH_JWKS_URI=https://keycloak.company.com/realms/datahub/protocol/openid-connect/certs | ||
|  | ``` | ||
|  | 
 | ||
|  | ## Using OAuth Tokens
 | ||
|  | 
 | ||
|  | Once configured, include your JWT token in the Authorization header when making API requests: | ||
|  | 
 | ||
|  | ```bash | ||
|  | curl -H "Authorization: Bearer YOUR_JWT_TOKEN" \ | ||
|  |      -H "Content-Type: application/json" \ | ||
|  |      https://your-datahub.com/api/graphql \ | ||
|  |      -d '{"query": "{ corpUsers { total } }"}' | ||
|  | ``` | ||
|  | 
 | ||
|  | For Python applications: | ||
|  | 
 | ||
|  | ```python | ||
|  | import requests | ||
|  | 
 | ||
|  | headers = { | ||
|  |     'Authorization': f'Bearer {your_jwt_token}', | ||
|  |     'Content-Type': 'application/json' | ||
|  | } | ||
|  | 
 | ||
|  | response = requests.post( | ||
|  |     'https://your-datahub.com/api/graphql', | ||
|  |     headers=headers, | ||
|  |     json={'query': '{ corpUsers { total } }'} | ||
|  | ) | ||
|  | ``` | ||
|  | 
 | ||
|  | ## Best Practices
 | ||
|  | 
 | ||
|  | - Use HTTPS for all JWKS URIs and discovery endpoints | ||
|  | - Use specific audience values (not wildcards) for better security | ||
|  | - Use short-lived tokens (< 1 hour recommended) | ||
|  | - Separate environments with different audiences (prod/staging/dev) | ||
|  | - Enable debug logging during setup: `DATAHUB_GMS_LOG_LEVEL=DEBUG` | ||
|  | 
 | ||
|  | ## Troubleshooting
 | ||
|  | 
 | ||
|  | ### Common Issues
 | ||
|  | 
 | ||
|  | **"OAuth authenticator is not configured"** | ||
|  | 
 | ||
|  | - Make sure `EXTERNAL_OAUTH_ENABLED=true` is set | ||
|  | - Verify all required environment variables are configured | ||
|  | 
 | ||
|  | **"No configured OAuth provider matches token issuer"** | ||
|  | 
 | ||
|  | - Check that your JWT issuer exactly matches `EXTERNAL_OAUTH_TRUSTED_ISSUERS` | ||
|  | 
 | ||
|  | **"Invalid or missing audience claim"** | ||
|  | 
 | ||
|  | - Verify your JWT audience is listed in `EXTERNAL_OAUTH_ALLOWED_AUDIENCES` | ||
|  | 
 | ||
|  | **"Failed to load signing keys"** | ||
|  | 
 | ||
|  | - Test your JWKS URI directly: `curl https://your-provider/.well-known/jwks.json` | ||
|  | - Check network connectivity from DataHub to your OAuth provider | ||
|  | 
 | ||
|  | ### Debugging
 | ||
|  | 
 | ||
|  | Enable debug logging to see detailed OAuth messages: | ||
|  | 
 | ||
|  | ```bash | ||
|  | # Set environment variable
 | ||
|  | DATAHUB_GMS_LOG_LEVEL=DEBUG | ||
|  | 
 | ||
|  | # Check logs
 | ||
|  | docker logs datahub-gms | grep -i oauth | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Testing Your Setup
 | ||
|  | 
 | ||
|  | Decode your JWT token to verify the claims: | ||
|  | 
 | ||
|  | ```bash | ||
|  | # Replace with your actual token
 | ||
|  | echo "YOUR_JWT_TOKEN" | cut -d. -f2 | base64 -d | jq | ||
|  | ``` | ||
|  | 
 | ||
|  | Make sure the `iss` (issuer) and `aud` (audience) claims match your configuration. | ||
|  | 
 | ||
|  | ## Advanced Options
 | ||
|  | 
 | ||
|  | You can customize which JWT claim contains the user ID: | ||
|  | 
 | ||
|  | ```bash | ||
|  | # Use email claim instead of default 'sub'
 | ||
|  | EXTERNAL_OAUTH_USER_ID_CLAIM=email | ||
|  | ``` | ||
|  | 
 | ||
|  | OAuth users are automatically created as service accounts with usernames like `__oauth_{issuer_domain}_{subject}`. |