mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-31 02:37:05 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			350 lines
		
	
	
		
			8.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			350 lines
		
	
	
		
			8.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Elasticsearch & OpenSearch Multi-Client Shim
 | |
| 
 | |
| This guide explains how to use DataHub's multi-client search engine shim to support different versions of Elasticsearch and OpenSearch through a unified interface.
 | |
| 
 | |
| ## Overview
 | |
| 
 | |
| DataHub's search client shim provides seamless support for:
 | |
| 
 | |
| - **Elasticsearch 7.17**
 | |
| - **Elasticsearch 8.17+**
 | |
| - **OpenSearch 2.x** with full REST high-level client support
 | |
| 
 | |
| This enables smooth migrations between different search engine versions while maintaining backward compatibility with existing DataHub deployments.
 | |
| 
 | |
| ## Architecture
 | |
| 
 | |
| ### Core Components
 | |
| 
 | |
| The shim consists of several key components:
 | |
| 
 | |
| 1. **`SearchClientShim`** - Main abstraction interface
 | |
| 2. **`SearchClientShimFactory`** - Factory for creating appropriate client implementations
 | |
| 3. **Implementation Classes** - Concrete implementations for each search engine:
 | |
|    - `Es7CompatibilitySearchClientShim` - ES 7.17
 | |
|    - `Es8SearchClientShim` - ES 8.17+
 | |
|    - `OpenSearch2SearchClientShim` - OpenSearch 2.x
 | |
| 
 | |
| ### Supported Configurations
 | |
| 
 | |
| | Source Engine            | Target Engine  | Shim Implementation                | Status      |
 | |
| | ------------------------ | -------------- | ---------------------------------- | ----------- |
 | |
| | DataHub → ES 7.17        | ES 7.17        | `Es7CompatibilitySearchClientShim` | ✅ Complete |
 | |
| | DataHub → ES 8.17+       | ES 8.17+       | `Es8SearchClientShim`              | ✅ Complete |
 | |
| | DataHub → OpenSearch 2.x | OpenSearch 2.x | `OpenSearch2SearchClientShim`      | ✅ Complete |
 | |
| 
 | |
| ## Configuration
 | |
| 
 | |
| ### Environment Variables
 | |
| 
 | |
| Configure the shim using these environment variables:
 | |
| 
 | |
| ```bash
 | |
| # Enable the search client shim (required)
 | |
| ELASTICSEARCH_SHIM_ENABLED=true
 | |
| 
 | |
| # Specify engine type (or use AUTO_DETECT)
 | |
| ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
 | |
| # Options: AUTO_DETECT, ELASTICSEARCH_7, ELASTICSEARCH_8, OPENSEARCH_2
 | |
| 
 | |
| # Enable auto-detection (recommended)
 | |
| ELASTICSEARCH_SHIM_AUTO_DETECT=true
 | |
| ```
 | |
| 
 | |
| ### application.yaml Configuration
 | |
| 
 | |
| Alternatively, configure via application.yaml:
 | |
| 
 | |
| ```yaml
 | |
| elasticsearch:
 | |
|   host: localhost
 | |
|   port: 9200
 | |
|   username: ${ELASTICSEARCH_USERNAME:#{null}}
 | |
|   password: ${ELASTICSEARCH_PASSWORD:#{null}}
 | |
|   useSSL: false
 | |
|   # Standard Elasticsearch configuration...
 | |
| 
 | |
|   # Multi-client shim configuration
 | |
|   shim:
 | |
|     enabled: true # Enable shim
 | |
|     engineType: AUTO_DETECT # or specific type
 | |
|     autoDetectEngine: true # Auto-detect cluster type
 | |
| ```
 | |
| 
 | |
| ## Migration Scenarios
 | |
| 
 | |
| ### Scenario 1: Elasticsearch 7.17 → Elasticsearch 8.x
 | |
| 
 | |
| This is the most common migration path.
 | |
| 
 | |
| **Step 1: Enable the shim**
 | |
| 
 | |
| ```bash
 | |
| ELASTICSEARCH_SHIM_ENABLED=true
 | |
| ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8
 | |
| ```
 | |
| 
 | |
| **Step 2: Verify connection**
 | |
| 
 | |
| ```bash
 | |
| # Check logs for successful connection
 | |
| ```
 | |
| 
 | |
| ### Scenario 2: Elasticsearch 7.17 → OpenSearch 2.x
 | |
| 
 | |
| Direct migration from Elasticsearch to OpenSearch 2.x.
 | |
| 
 | |
| **Configuration:**
 | |
| 
 | |
| ```bash
 | |
| ELASTICSEARCH_SHIM_ENABLED=true
 | |
| ELASTICSEARCH_SHIM_ENGINE_TYPE=OPENSEARCH_2
 | |
| ELASTICSEARCH_SHIM_AUTO_DETECT=true
 | |
| ```
 | |
| 
 | |
| ### Scenario 3: Auto-Detection (Recommended)
 | |
| 
 | |
| Let DataHub automatically detect your search engine type:
 | |
| 
 | |
| ```bash
 | |
| ELASTICSEARCH_SHIM_ENABLED=true
 | |
| ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
 | |
| ELASTICSEARCH_SHIM_AUTO_DETECT=true
 | |
| ```
 | |
| 
 | |
| The shim will:
 | |
| 
 | |
| 1. Connect to your search cluster
 | |
| 2. Identify the engine type and version
 | |
| 3. Select the appropriate client implementation
 | |
| 
 | |
| ## Deployment Guide
 | |
| 
 | |
| ### Docker Compose
 | |
| 
 | |
| Update your `docker-compose.yml`:
 | |
| 
 | |
| ```yaml
 | |
| services:
 | |
|   datahub-gms:
 | |
|     environment:
 | |
|       - ELASTICSEARCH_SHIM_ENABLED=true
 | |
|       - ELASTICSEARCH_SHIM_ENGINE_TYPE=AUTO_DETECT
 | |
|       # ... other ES config
 | |
| ```
 | |
| 
 | |
| ### Kubernetes
 | |
| 
 | |
| Update your deployment manifests:
 | |
| 
 | |
| ```yaml
 | |
| apiVersion: apps/v1
 | |
| kind: Deployment
 | |
| metadata:
 | |
|   name: datahub-gms
 | |
| spec:
 | |
|   template:
 | |
|     spec:
 | |
|       containers:
 | |
|         - name: datahub-gms
 | |
|           env:
 | |
|             - name: ELASTICSEARCH_SHIM_ENABLED
 | |
|               value: "true"
 | |
|             - name: ELASTICSEARCH_SHIM_ENGINE_TYPE
 | |
|               value: "AUTO_DETECT"
 | |
|           # ... other configuration
 | |
| ```
 | |
| 
 | |
| ### Helm
 | |
| 
 | |
| Update your `values.yaml`:
 | |
| 
 | |
| ```yaml
 | |
| global:
 | |
|   elasticsearch:
 | |
|     shim:
 | |
|       enabled: true
 | |
|       engineType: "AUTO_DETECT"
 | |
|       autoDetectEngine: true
 | |
| ```
 | |
| 
 | |
| ## Validation and Testing
 | |
| 
 | |
| ### Verify Shim Configuration
 | |
| 
 | |
| 1. **Check logs** for shim initialization:
 | |
| 
 | |
| ```bash
 | |
| docker logs datahub-gms | grep -i "shim\|search"
 | |
| ```
 | |
| 
 | |
| Look for messages like:
 | |
| 
 | |
| ```
 | |
| INFO  Creating SearchClientShim for engine type: ELASTICSEARCH_7
 | |
| INFO  Auto-detected search engine type: ELASTICSEARCH_7
 | |
| ```
 | |
| 
 | |
| 1. **Test search functionality** in DataHub UI:
 | |
| 
 | |
| - Search for datasets
 | |
| - Browse data assets
 | |
| - Check that lineage is working
 | |
| 
 | |
| 2. **Monitor performance** during transition:
 | |
| 
 | |
| - Watch for connection errors
 | |
| - Check response times
 | |
| - Monitor resource usage
 | |
| 
 | |
| ### Common Validation Steps
 | |
| 
 | |
| ```bash
 | |
| # 1. Check DataHub health endpoint
 | |
| curl http://localhost:8080/health
 | |
| 
 | |
| # 2. Verify search index access
 | |
| curl -u user:pass "http://elasticsearch:9200/_cat/indices?v"
 | |
| 
 | |
| # 3. Test search functionality
 | |
| curl -X POST "http://localhost:8080/api/graphql" \
 | |
|   -H "Content-Type: application/json" \
 | |
|   -d '{"query": "{ search(input: {type: DATASET, query: \"*\"}) { total }}"}'
 | |
| ```
 | |
| 
 | |
| ## Troubleshooting
 | |
| 
 | |
| ### Common Issues
 | |
| 
 | |
| #### 1. Connection Failures
 | |
| 
 | |
| ```
 | |
| ERROR: Unable to connect to search cluster
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| 
 | |
| - Verify `ELASTICSEARCH_HOST` and `ELASTICSEARCH_PORT`
 | |
| - Check network connectivity between DataHub and search cluster
 | |
| - Ensure credentials are correct
 | |
| - Verify SSL/TLS configuration (ES8 Containers use SSL by default so if you previously weren't this may cause issues)
 | |
| 
 | |
| #### 2. Auto-Detection Failures
 | |
| 
 | |
| ```
 | |
| ERROR: Unable to detect search engine type
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| 
 | |
| - Manually specify engine type: `ELASTICSEARCH_SHIM_ENGINE_TYPE=ELASTICSEARCH_8`
 | |
| - Check cluster health: `curl http://elasticsearch:9200/_cluster/health`
 | |
| - Verify authentication credentials
 | |
| 
 | |
| #### 3. API Compatibility Issues
 | |
| 
 | |
| ```
 | |
| ERROR: Incompatible API version
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| 
 | |
| - Check Elasticsearch version compatibility
 | |
| - Review deprecation warnings in ES logs
 | |
| 
 | |
| #### 4. Dependency Issues
 | |
| 
 | |
| ```
 | |
| ERROR: ClassNotFoundException for ES client
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| 
 | |
| - Ensure correct client dependencies are included in classpath
 | |
| - Check `build.gradle` for required dependencies
 | |
| - Rebuild DataHub with appropriate client libraries
 | |
| 
 | |
| ### Debug Mode
 | |
| 
 | |
| Enable debug logging to troubleshoot issues:
 | |
| 
 | |
| ```bash
 | |
| # Add to environment
 | |
| DATAHUB_LOG_LEVEL=DEBUG
 | |
| ELASTICSEARCH_SHIM_DEBUG=true
 | |
| ```
 | |
| 
 | |
| ### Performance Monitoring
 | |
| 
 | |
| Monitor key metrics during migration:
 | |
| 
 | |
| ```bash
 | |
| # Connection pool metrics
 | |
| curl "http://localhost:8080/actuator/metrics/elasticsearch.connections"
 | |
| 
 | |
| # Search operation metrics
 | |
| curl "http://localhost:8080/actuator/metrics/elasticsearch.search"
 | |
| 
 | |
| # Error rates
 | |
| curl "http://localhost:8080/actuator/metrics/elasticsearch.errors"
 | |
| ```
 | |
| 
 | |
| ## Best Practices
 | |
| 
 | |
| ### Pre-Migration
 | |
| 
 | |
| 1. **Backup your data** before changing search engine configuration
 | |
| 2. **Test in staging** with representative data volumes
 | |
| 3. **Monitor resource usage** patterns in current deployment
 | |
| 4. **Document current configuration** for rollback scenarios
 | |
| 
 | |
| ### During Migration
 | |
| 
 | |
| 1. **Enable auto-detection initially** for smooth transition
 | |
| 2. **Monitor logs closely** for connection and performance issues
 | |
| 3. **Test all search functionality** after configuration changes
 | |
| 
 | |
| ### Post-Migration
 | |
| 
 | |
| 1. **Update documentation** with new configuration
 | |
| 2. **Monitor performance metrics** for several days
 | |
| 3. **Plan for future upgrades** (ES 8.x native support)
 | |
| 4. **Train team members** on new configuration options
 | |
| 
 | |
| ## Future Enhancements
 | |
| 
 | |
| ### Planned Features
 | |
| 
 | |
| 1. **OpenSearch 3.x support** when available
 | |
| 2. **Enhanced AWS IAM authentication** for all client types
 | |
| 3. **Advanced feature detection** and capability querying
 | |
| 
 | |
| ### Contributing
 | |
| 
 | |
| To extend the shim for additional search engines:
 | |
| 
 | |
| 1. **Implement `SearchClientShim`** interface
 | |
| 2. **Add engine type** to `SearchEngineType` enum
 | |
| 3. **Update factory logic** in `SearchClientShimFactory`
 | |
| 4. **Add configuration options** to application.yaml
 | |
| 5. **Write tests** and documentation
 | |
| 
 | |
| ## Support Matrix
 | |
| 
 | |
| | DataHub Version | ES 7.17 | ES 8.x   | OpenSearch 2.x |
 | |
| | --------------- | ------- | -------- | -------------- |
 | |
| | 0.3.15+         | ✅ Full | ✅ 8.17+ | ✅ Full        |
 | |
| | Future          | ✅ Full | ✅ Full  | ✅ Full        |
 | |
| 
 | |
| ## FAQ
 | |
| 
 | |
| ### Q: Can I use the shim with existing deployments?
 | |
| 
 | |
| **A:** Yes, the shim is backward compatible. It is a thin abstraction layer over the existing code
 | |
| 
 | |
| ### Q: Can I use multiple search engines simultaneously?
 | |
| 
 | |
| **A:** No, DataHub connects to one search cluster at a time. Use the shim to switch between different engine types.
 | |
| 
 | |
| For additional support, please refer to the [DataHub community forums](https://datahubproject.io/docs/community) or file an issue in the [GitHub repository](https://github.com/datahubproject/datahub).
 | 
