8.7 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Elasticsearch & OpenSearch Multi-Client Shim
This guide explains how to use DataHub's multi-client search engine shim to support different versions of Elasticsearch and OpenSearch through a unified interface.
Overview
DataHub's search client shim provides seamless support for:
- Elasticsearch 7.17
- Elasticsearch 8.17+
- OpenSearch 2.x with full REST high-level client support
This enables smooth migrations between different search engine versions while maintaining backward compatibility with existing DataHub deployments.
Architecture
Core Components
The shim consists of several key components:
- SearchClientShim- Main abstraction interface
- SearchClientShimFactory- Factory for creating appropriate client implementations
- Implementation Classes - Concrete implementations for each search engine:
- Es7CompatibilitySearchClientShim- ES 7.17
- Es8SearchClientShim- ES 8.17+
- OpenSearch2SearchClientShim- OpenSearch 2.x
 
Supported Configurations
| Source Engine | Target Engine | Shim Implementation | Status | 
|---|---|---|---|
| DataHub → ES 7.17 | ES 7.17 | Es7CompatibilitySearchClientShim | ✅ Complete | 
| DataHub → ES 8.17+ | ES 8.17+ | Es8SearchClientShim | ✅ Complete | 
| DataHub → OpenSearch 2.x | OpenSearch 2.x | OpenSearch2SearchClientShim | ✅ Complete | 
Configuration
Environment Variables
Configure the shim using these environment variables:
# Enable the search client shim (required)
ELASTICSEARCH_SHIM_ENABLED=true
# Specify engine type (or use AUTO_DETECT)
ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
# Options: AUTO_DETECT, ELASTICSEARCH_7, ELASTICSEARCH_8, OPENSEARCH_2
# Enable auto-detection (recommended)
ELASTICSEARCH_SHIM_AUTO_DETECT=true
application.yaml Configuration
Alternatively, configure via application.yaml:
elasticsearch:
  host: localhost
  port: 9200
  username: ${ELASTICSEARCH_USERNAME:#{null}}
  password: ${ELASTICSEARCH_PASSWORD:#{null}}
  useSSL: false
  # Standard Elasticsearch configuration...
  # Multi-client shim configuration
  shim:
    enabled: true # Enable shim
    engineType: AUTO_DETECT # or specific type
    autoDetectEngine: true # Auto-detect cluster type
Migration Scenarios
Scenario 1: Elasticsearch 7.17 → Elasticsearch 8.x
This is the most common migration path.
Step 1: Enable the shim
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8
Step 2: Verify connection
# Check logs for successful connection
Scenario 2: Elasticsearch 7.17 → OpenSearch 2.x
Direct migration from Elasticsearch to OpenSearch 2.x.
Configuration:
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=OPENSEARCH_2
ELASTICSEARCH_SHIM_AUTO_DETECT=true
Scenario 3: Auto-Detection (Recommended)
Let DataHub automatically detect your search engine type:
ELASTICSEARCH_SHIM_ENABLED=true
ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
ELASTICSEARCH_SHIM_AUTO_DETECT=true
The shim will:
- Connect to your search cluster
- Identify the engine type and version
- Select the appropriate client implementation
Deployment Guide
Docker Compose
Update your docker-compose.yml:
services:
  datahub-gms:
    environment:
      - ELASTICSEARCH_SHIM_ENABLED=true
      - ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
      # ... other ES config
Kubernetes
Update your deployment manifests:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: datahub-gms
spec:
  template:
    spec:
      containers:
        - name: datahub-gms
          env:
            - name: ELASTICSEARCH_SHIM_ENABLED
              value: "true"
            - name: ELASTICSEARCH_SHIM_ENGINE_TYPE
              value: "AUTO_DETECT"
          # ... other configuration
Helm
Update your values.yaml:
global:
  elasticsearch:
    shim:
      enabled: true
      engineType: "AUTO_DETECT"
      autoDetectEngine: true
Validation and Testing
Verify Shim Configuration
- Check logs for shim initialization:
docker logs datahub-gms | grep -i "shim\|search"
Look for messages like:
INFO  Creating SearchClientShim for engine type: ELASTICSEARCH_7
INFO  Auto-detected search engine type: ELASTICSEARCH_7
- Test search functionality in DataHub UI:
- Search for datasets
- Browse data assets
- Check that lineage is working
- Monitor performance during transition:
- Watch for connection errors
- Check response times
- Monitor resource usage
Common Validation Steps
# 1. Check DataHub health endpoint
curl http://localhost:8080/health
# 2. Verify search index access
curl -u user:pass "http://elasticsearch:9200/_cat/indices?v"
# 3. Test search functionality
curl -X POST "http://localhost:8080/api/graphql" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ search(input: {type: DATASET, query: \"*\"}) { total }}"}'
Troubleshooting
Common Issues
1. Connection Failures
ERROR: Unable to connect to search cluster
Solutions:
- Verify ELASTICSEARCH_HOSTandELASTICSEARCH_PORT
- Check network connectivity between DataHub and search cluster
- Ensure credentials are correct
- Verify SSL/TLS configuration (ES8 Containers use SSL by default so if you previously weren't this may cause issues)
2. Auto-Detection Failures
ERROR: Unable to detect search engine type
Solutions:
- Manually specify engine type: ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8
- Check cluster health: curl http://elasticsearch:9200/_cluster/health
- Verify authentication credentials
3. API Compatibility Issues
ERROR: Incompatible API version
Solutions:
- Check Elasticsearch version compatibility
- Review deprecation warnings in ES logs
4. Dependency Issues
ERROR: ClassNotFoundException for ES client
Solutions:
- Ensure correct client dependencies are included in classpath
- Check build.gradlefor required dependencies
- Rebuild DataHub with appropriate client libraries
Debug Mode
Enable debug logging to troubleshoot issues:
# Add to environment
DATAHUB_LOG_LEVEL=DEBUG
ELASTICSEARCH_SHIM_DEBUG=true
Performance Monitoring
Monitor key metrics during migration:
# Connection pool metrics
curl "http://localhost:8080/actuator/metrics/elasticsearch.connections"
# Search operation metrics
curl "http://localhost:8080/actuator/metrics/elasticsearch.search"
# Error rates
curl "http://localhost:8080/actuator/metrics/elasticsearch.errors"
Best Practices
Pre-Migration
- Backup your data before changing search engine configuration
- Test in staging with representative data volumes
- Monitor resource usage patterns in current deployment
- Document current configuration for rollback scenarios
During Migration
- Enable auto-detection initially for smooth transition
- Monitor logs closely for connection and performance issues
- Test all search functionality after configuration changes
Post-Migration
- Update documentation with new configuration
- Monitor performance metrics for several days
- Plan for future upgrades (ES 8.x native support)
- Train team members on new configuration options
Future Enhancements
Planned Features
- OpenSearch 3.x support when available
- Enhanced AWS IAM authentication for all client types
- Advanced feature detection and capability querying
Contributing
To extend the shim for additional search engines:
- Implement SearchClientShiminterface
- Add engine type to SearchEngineTypeenum
- Update factory logic in SearchClientShimFactory
- Add configuration options to application.yaml
- Write tests and documentation
Support Matrix
| DataHub Version | ES 7.17 | ES 8.x | OpenSearch 2.x | 
|---|---|---|---|
| 0.3.15+ | ✅ Full | ✅ 8.17+ | ✅ Full | 
| Future | ✅ Full | ✅ Full | ✅ Full | 
FAQ
Q: Can I use the shim with existing deployments?
A: Yes, the shim is backward compatible. It is a thin abstraction layer over the existing code
Q: Can I use multiple search engines simultaneously?
A: No, DataHub connects to one search cluster at a time. Use the shim to switch between different engine types.
For additional support, please refer to the DataHub community forums or file an issue in the GitHub repository.
