mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-31 02:37:05 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			75 lines
		
	
	
		
			2.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			75 lines
		
	
	
		
			2.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # SchemaTron (Incubating)
 | |
| 
 | |
| > ⚠️ This is an incubating project in draft status. APIs and functionality may change significantly between releases.
 | |
| 
 | |
| SchemaTron is a schema translation toolkit that converts between various schema formats and DataHub's native schema representation. It currently provides robust support for Apache Avro schema translation with a focus on complex schema structures including unions, arrays, maps, and nested records.
 | |
| 
 | |
| ## Modules
 | |
| 
 | |
| ### CLI Module
 | |
| 
 | |
| Command-line interface for converting schemas and emitting them to DataHub.
 | |
| 
 | |
| ```bash
 | |
| # Execute from this directory
 | |
| ../../../gradlew :metadata-integration:java:datahub-schematron:cli:run --args="-i cli/src/test/resources/FlatUser.avsc"
 | |
| ```
 | |
| 
 | |
| #### CLI Options
 | |
| 
 | |
| - `-i, --input`: Input schema file or directory path
 | |
| - `-p, --platform`: Data platform name (default: "avro")
 | |
| - `-s, --server`: DataHub server URL (default: "http://localhost:8080")
 | |
| - `-t, --token`: DataHub access token
 | |
| - `--sink`: Output sink - "rest" or "file" (default: "rest")
 | |
| - `--output-file`: Output file path when using file sink (default: "metadata.json")
 | |
| 
 | |
| ### Library Module
 | |
| 
 | |
| Core translation logic and models for schema conversion. Features include:
 | |
| 
 | |
| - Support for complex Avro schema structures:
 | |
| 
 | |
|   - Union types with multiple record options
 | |
|   - Nested records and arrays
 | |
|   - Optional fields with defaults
 | |
|   - Logical types (date, timestamp, etc.)
 | |
|   - Maps with various value types
 | |
|   - Enum types
 | |
|   - Custom metadata and documentation
 | |
| 
 | |
| - Comprehensive path handling for schema fields
 | |
| - DataHub-compatible metadata generation
 | |
| - Schema fingerprinting and versioning
 | |
| 
 | |
| ## Example Schema Support
 | |
| 
 | |
| The library can handle sophisticated schema structures including:
 | |
| 
 | |
| - Customer profiles with multiple identification types (passport, driver's license, national ID)
 | |
| - Contact information with primary and alternative contact methods
 | |
| - Address validation with verification metadata
 | |
| - Subscription history tracking
 | |
| - Flexible preference and metadata storage
 | |
| - Tagged customer attributes
 | |
| 
 | |
| ## Development
 | |
| 
 | |
| The project includes extensive test coverage through:
 | |
| 
 | |
| - Unit tests for field path handling
 | |
| - Schema translation comparison tests
 | |
| - Integration tests with Python reference implementation
 | |
| 
 | |
| Test resources include example schemas demonstrating various Avro schema features and edge cases.
 | |
| 
 | |
| ## Contributing
 | |
| 
 | |
| As this is an incubating project, we welcome contributions and feedback on:
 | |
| 
 | |
| - Additional schema format support
 | |
| - Improved handling of complex schema patterns
 | |
| - Enhanced metadata translation
 | |
| - Documentation and examples
 | |
| - Test coverage
 | 
