7.3 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	External OAuth Authentication
DataHub supports authenticating API requests using JWT tokens from external identity providers like Okta, Azure AD, Google Identity, and others. This is perfect for service-to-service authentication where your applications need to call DataHub APIs.
Overview
When you configure OAuth authentication, DataHub will:
- Accept JWT tokens from your trusted identity provider
- Validate the token signature and claims
- Automatically create service accounts for authenticated users
- Grant API access based on DataHub's permission system
Configuration
Configure OAuth authentication by setting these environment variables in your DataHub deployment:
Set these environment variables for the datahub-gms service:
# Enable OAuth authentication
EXTERNAL_OAUTH_ENABLED=true
# Required: Trusted JWT issuers (comma-separated)
EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://auth.example.com,https://okta.company.com
# Required: Allowed JWT audiences (comma-separated)
EXTERNAL_OAUTH_ALLOWED_AUDIENCES=datahub-api,my-service-id
# Required: JWKS endpoint for signature verification
EXTERNAL_OAUTH_JWKS_URI=https://auth.example.com/.well-known/jwks.json
# Optional: JWT claim containing user ID (default: "sub")
EXTERNAL_OAUTH_USER_ID_CLAIM=sub
# Optional: Signing algorithm (default: "RS256")
EXTERNAL_OAUTH_ALGORITHM=RS256
Docker Compose Example
version: "3.8"
services:
  datahub-gms:
    image: acryldata/datahub-gms:latest
    environment:
      # External OAuth Configuration
      - EXTERNAL_OAUTH_ENABLED=true
      - EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://my-okta-domain.okta.com/oauth2/default
      - EXTERNAL_OAUTH_ALLOWED_AUDIENCES=0oa1234567890abcdef
      - EXTERNAL_OAUTH_JWKS_URI=https://my-okta-domain.okta.com/oauth2/default/v1/keys
      - EXTERNAL_OAUTH_USER_ID_CLAIM=sub
      - EXTERNAL_OAUTH_ALGORITHM=RS256
      # Standard DataHub settings
      - DATAHUB_GMS_HOST=0.0.0.0
      - DATAHUB_GMS_PORT=8080
      # ... other configurations
Kubernetes Example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: datahub-gms
spec:
  template:
    spec:
      containers:
        - name: datahub-gms
          image: acryldata/datahub-gms:latest
          env:
            - name: EXTERNAL_OAUTH_ENABLED
              value: "true"
            - name: EXTERNAL_OAUTH_TRUSTED_ISSUERS
              value: "https://login.microsoftonline.com/tenant-id/v2.0"
            - name: EXTERNAL_OAUTH_ALLOWED_AUDIENCES
              value: "api://datahub-prod"
            - name: EXTERNAL_OAUTH_JWKS_URI
              value: "https://login.microsoftonline.com/tenant-id/discovery/v2.0/keys"
          # ... other environment variables
Multiple Providers
To support multiple OAuth providers, use comma-separated values:
# Multiple issuers and audiences
EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://okta.company.com,https://auth0.company.com
EXTERNAL_OAUTH_ALLOWED_AUDIENCES=datahub-prod,datahub-staging,service-account-id
# Single JWKS URI (if providers share keys) or discovery URI
EXTERNAL_OAUTH_JWKS_URI=https://okta.company.com/.well-known/jwks.json
# Or use discovery URI to auto-derive JWKS
EXTERNAL_OAUTH_DISCOVERY_URI=https://okta.company.com/.well-known/openid-configuration
Discovery URI vs JWKS URI
You can specify either:
- JWKS URI: Direct endpoint to signing keys (recommended for production)
- Discovery URI: OIDC discovery document URL (DataHub will auto-derive JWKS URI)
# Option 1: Direct JWKS URI (faster, more reliable)
EXTERNAL_OAUTH_JWKS_URI=https://auth.example.com/.well-known/jwks.json
# Option 2: Discovery URI (convenient, auto-derives JWKS)
EXTERNAL_OAUTH_DISCOVERY_URI=https://auth.example.com/.well-known/openid-configuration
Provider Examples
Okta
EXTERNAL_OAUTH_ENABLED=true
EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://your-domain.okta.com/oauth2/default
EXTERNAL_OAUTH_ALLOWED_AUDIENCES=0oa1234567890abcdef
EXTERNAL_OAUTH_JWKS_URI=https://your-domain.okta.com/oauth2/default/v1/keys
Auth0
EXTERNAL_OAUTH_ENABLED=true
EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://your-domain.auth0.com/
EXTERNAL_OAUTH_ALLOWED_AUDIENCES=https://your-api-identifier/
EXTERNAL_OAUTH_JWKS_URI=https://your-domain.auth0.com/.well-known/jwks.json
Azure AD / Microsoft Entra
EXTERNAL_OAUTH_ENABLED=true
EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://login.microsoftonline.com/your-tenant-id/v2.0
EXTERNAL_OAUTH_ALLOWED_AUDIENCES=api://your-app-id
EXTERNAL_OAUTH_JWKS_URI=https://login.microsoftonline.com/your-tenant-id/discovery/v2.0/keys
Google Cloud Identity
EXTERNAL_OAUTH_ENABLED=true
EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://accounts.google.com
EXTERNAL_OAUTH_ALLOWED_AUDIENCES=your-client-id.apps.googleusercontent.com
EXTERNAL_OAUTH_JWKS_URI=https://www.googleapis.com/oauth2/v3/certs
Keycloak
EXTERNAL_OAUTH_ENABLED=true
EXTERNAL_OAUTH_TRUSTED_ISSUERS=https://keycloak.company.com/realms/datahub
EXTERNAL_OAUTH_ALLOWED_AUDIENCES=datahub-client
EXTERNAL_OAUTH_JWKS_URI=https://keycloak.company.com/realms/datahub/protocol/openid-connect/certs
Using OAuth Tokens
Once configured, include your JWT token in the Authorization header when making API requests:
curl -H "Authorization: Bearer YOUR_JWT_TOKEN" \
     -H "Content-Type: application/json" \
     https://your-datahub.com/api/graphql \
     -d '{"query": "{ corpUsers { total } }"}'
For Python applications:
import requests
headers = {
    'Authorization': f'Bearer {your_jwt_token}',
    'Content-Type': 'application/json'
}
response = requests.post(
    'https://your-datahub.com/api/graphql',
    headers=headers,
    json={'query': '{ corpUsers { total } }'}
)
Best Practices
- Use HTTPS for all JWKS URIs and discovery endpoints
- Use specific audience values (not wildcards) for better security
- Use short-lived tokens (< 1 hour recommended)
- Separate environments with different audiences (prod/staging/dev)
- Enable debug logging during setup: DATAHUB_GMS_LOG_LEVEL=DEBUG
Troubleshooting
Common Issues
"OAuth authenticator is not configured"
- Make sure EXTERNAL_OAUTH_ENABLED=trueis set
- Verify all required environment variables are configured
"No configured OAuth provider matches token issuer"
- Check that your JWT issuer exactly matches EXTERNAL_OAUTH_TRUSTED_ISSUERS
"Invalid or missing audience claim"
- Verify your JWT audience is listed in EXTERNAL_OAUTH_ALLOWED_AUDIENCES
"Failed to load signing keys"
- Test your JWKS URI directly: curl https://your-provider/.well-known/jwks.json
- Check network connectivity from DataHub to your OAuth provider
Debugging
Enable debug logging to see detailed OAuth messages:
# Set environment variable
DATAHUB_GMS_LOG_LEVEL=DEBUG
# Check logs
docker logs datahub-gms | grep -i oauth
Testing Your Setup
Decode your JWT token to verify the claims:
# Replace with your actual token
echo "YOUR_JWT_TOKEN" | cut -d. -f2 | base64 -d | jq
Make sure the iss (issuer) and aud (audience) claims match your configuration.
Advanced Options
You can customize which JWT claim contains the user ID:
# Use email claim instead of default 'sub'
EXTERNAL_OAUTH_USER_ID_CLAIM=email
OAuth users are automatically created as service accounts with usernames like __oauth_{issuer_domain}_{subject}.
