2024-11-26 14:03:22 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# SchemaTron (Incubating)
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								>  ⚠️ This is an incubating project in draft status. APIs and functionality may change significantly between releases.
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								SchemaTron is a schema translation toolkit that converts between various schema formats and DataHub's native schema representation. It currently provides robust support for Apache Avro schema translation with a focus on complex schema structures including unions, arrays, maps, and nested records.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Modules
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### CLI Module
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Command-line interface for converting schemas and emitting them to DataHub.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```bash
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# Execute from this directory
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								../../../gradlew :metadata-integration:java:datahub-schematron:cli:run --args="-i cli/src/test/resources/FlatUser.avsc"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								#### CLI Options
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  `-i, --input` : Input schema file or directory path 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  `-p, --platform` : Data platform name (default: "avro") 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  `-s, --server` : DataHub server URL (default: "http://localhost:8080") 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  `-t, --token` : DataHub access token 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  `--sink` : Output sink - "rest" or "file" (default: "rest") 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  `--output-file` : Output file path when using file sink (default: "metadata.json") 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Library Module
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Core translation logic and models for schema conversion. Features include:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Support for complex Avro schema structures: 
						 
					
						
							
								
									
										
										
										
											2025-04-16 16:55:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-11-26 14:03:22 -08:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								  -  Union types with multiple record options
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  -  Nested records and arrays
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  -  Optional fields with defaults
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  -  Logical types (date, timestamp, etc.)
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  -  Maps with various value types
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  -  Enum types
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								  -  Custom metadata and documentation
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Comprehensive path handling for schema fields 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  DataHub-compatible metadata generation 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Schema fingerprinting and versioning 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Example Schema Support
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								The library can handle sophisticated schema structures including:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Customer profiles with multiple identification types (passport, driver's license, national ID) 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Contact information with primary and alternative contact methods 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Address validation with verification metadata 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Subscription history tracking 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Flexible preference and metadata storage 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Tagged customer attributes 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Development
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								The project includes extensive test coverage through:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Unit tests for field path handling 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Schema translation comparison tests 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Integration tests with Python reference implementation 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Test resources include example schemas demonstrating various Avro schema features and edge cases.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Contributing
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								As this is an incubating project, we welcome contributions and feedback on:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Additional schema format support 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Improved handling of complex schema patterns 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Enhanced metadata translation 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								-  Documentation and examples 
						 
					
						
							
								
									
										
										
										
											2025-04-16 16:55:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								-  Test coverage