mirror of
				https://github.com/open-metadata/OpenMetadata.git
				synced 2025-10-31 02:29:03 +00:00 
			
		
		
		
	
		
			
	
	
		
			279 lines
		
	
	
		
			7.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			279 lines
		
	
	
		
			7.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | # OpenMetadata SDK Coverage Analysis Report
 | ||
|  | 
 | ||
|  | ## Executive Summary
 | ||
|  | This report compares the Java and Python SDK implementations to identify coverage gaps and missing features in the Python SDK. | ||
|  | 
 | ||
|  | ### Key Findings
 | ||
|  | - **Java SDK**: 68 API client classes providing comprehensive coverage | ||
|  | - **Python SDK**: 13 entity classes + 23 OMeta mixins providing partial coverage | ||
|  | - **Coverage Gap**: Python SDK missing ~40+ API endpoints available in Java | ||
|  | 
 | ||
|  | ## Java SDK API Coverage (68 APIs)
 | ||
|  | 
 | ||
|  | ### Data Assets (16 APIs)
 | ||
|  | - ✅ Tables | ||
|  | - ✅ Databases   | ||
|  | - ✅ DatabaseSchemas | ||
|  | - ✅ Containers | ||
|  | - ✅ Topics | ||
|  | - ✅ Dashboards | ||
|  | - ✅ Charts | ||
|  | - ✅ Pipelines | ||
|  | - ✅ MlModels | ||
|  | - ✅ Metrics | ||
|  | - ✅ StoredProcedures | ||
|  | - ✅ DashboardDataModels | ||
|  | - ✅ SearchIndex | ||
|  | - ✅ ApiEndpoint | ||
|  | - ✅ Queries | ||
|  | - ✅ Spreadsheets | ||
|  | 
 | ||
|  | ### Services (10 APIs)
 | ||
|  | - ✅ DatabaseServices | ||
|  | - ✅ DashboardServices | ||
|  | - ✅ PipelineServices | ||
|  | - ✅ MessagingServices | ||
|  | - ✅ MlModelServices | ||
|  | - ✅ MetadataServices | ||
|  | - ✅ ObjectStoreServices | ||
|  | - ✅ SearchServices | ||
|  | - ✅ ApiServices | ||
|  | - ✅ DriveServices | ||
|  | 
 | ||
|  | ### Governance & Quality (8 APIs)
 | ||
|  | - ✅ Glossaries | ||
|  | - ✅ Classifications | ||
|  | - ✅ Domains | ||
|  | - ✅ DataContracts | ||
|  | - ✅ TestCases | ||
|  | - ✅ TestSuites | ||
|  | - ✅ TestDefinitions | ||
|  | - ✅ TestCaseResults | ||
|  | 
 | ||
|  | ### Security & Access (7 APIs)
 | ||
|  | - ✅ Users | ||
|  | - ✅ Teams | ||
|  | - ✅ Roles | ||
|  | - ✅ Policies | ||
|  | - ✅ Bots | ||
|  | - ✅ Permissions | ||
|  | - ✅ SecurityServices | ||
|  | 
 | ||
|  | ### Operations (8 APIs)
 | ||
|  | - ✅ IngestionPipelines | ||
|  | - ✅ WorkflowDefinitions | ||
|  | - ✅ WorkflowInstances | ||
|  | - ✅ WorkflowInstanceStates | ||
|  | - ✅ Events | ||
|  | - ✅ Feeds | ||
|  | - ✅ Usage | ||
|  | - ✅ Suggestions | ||
|  | 
 | ||
|  | ### Platform Features (8 APIs)
 | ||
|  | - ✅ Lineage | ||
|  | - ✅ Search | ||
|  | - ✅ Metadata | ||
|  | - ✅ System | ||
|  | - ✅ DocumentStore | ||
|  | - ✅ Files | ||
|  | - ✅ Directories | ||
|  | - ✅ Apps | ||
|  | 
 | ||
|  | ### Advanced Features (11 APIs)
 | ||
|  | - ✅ ApiCollections | ||
|  | - ✅ Personas | ||
|  | - ✅ Columns | ||
|  | - ✅ Worksheets | ||
|  | - ✅ ReportsBeta | ||
|  | - ✅ Rdf | ||
|  | - ✅ RdfSql | ||
|  | - ✅ Scim | ||
|  | - ✅ TestCaseIncidentManager | ||
|  | - ✅ QueryCostRecordManager | ||
|  | - ✅ Default | ||
|  | 
 | ||
|  | ## Python SDK Coverage
 | ||
|  | 
 | ||
|  | ### Entity Classes (13 entities)
 | ||
|  | ✅ Table | ||
|  | ✅ Database | ||
|  | ✅ DatabaseSchema | ||
|  | ✅ Dashboard | ||
|  | ✅ Pipeline | ||
|  | ✅ Container | ||
|  | ✅ Topic | ||
|  | ✅ Team | ||
|  | ✅ User (user_improved) | ||
|  | ✅ Glossary | ||
|  | ✅ GlossaryTerm | ||
|  | ✅ TableImproved | ||
|  | ⚠️  BaseEntity (abstract base) | ||
|  | 
 | ||
|  | ### API Modules (3 modules)
 | ||
|  | ✅ Search | ||
|  | ✅ Lineage | ||
|  | ✅ Bulk | ||
|  | 
 | ||
|  | ### OMeta Mixins (23 mixins)
 | ||
|  | ✅ custom_property_mixin | ||
|  | ✅ dashboard_mixin | ||
|  | ✅ data_contract_mixin | ||
|  | ✅ data_insight_mixin | ||
|  | ✅ domain_mixin | ||
|  | ✅ es_mixin | ||
|  | ✅ ingestion_pipeline_mixin | ||
|  | ✅ lineage_mixin | ||
|  | ✅ mlmodel_mixin | ||
|  | ✅ patch_mixin | ||
|  | ✅ pipeline_mixin | ||
|  | ✅ query_mixin | ||
|  | ✅ role_policy_mixin | ||
|  | ✅ search_index_mixin | ||
|  | ✅ server_mixin | ||
|  | ✅ service_mixin | ||
|  | ✅ suggestions_mixin | ||
|  | ✅ table_mixin | ||
|  | ✅ tests_mixin | ||
|  | ✅ topic_mixin | ||
|  | ✅ user_mixin | ||
|  | ✅ version_mixin | ||
|  | 
 | ||
|  | ## Coverage Gap Analysis
 | ||
|  | 
 | ||
|  | ### Missing Entity Classes in Python SDK (High Priority)
 | ||
|  | 1. **Charts** - Dashboard visualization components | ||
|  | 2. **Metrics** - Business metrics and KPIs | ||
|  | 3. **MLModels** - Machine learning model entities | ||
|  | 4. **StoredProcedures** - Database stored procedures | ||
|  | 5. **DashboardDataModels** - Dashboard data modeling | ||
|  | 6. **SearchIndex** - Search index management | ||
|  | 7. **ApiEndpoint** - API endpoint documentation | ||
|  | 8. **Queries** - SQL query management | ||
|  | 9. **Spreadsheets** - Spreadsheet assets | ||
|  | 10. **Worksheets** - Worksheet management | ||
|  | 
 | ||
|  | ### Missing Service Classes in Python SDK
 | ||
|  | 1. **MessagingServices** - Kafka, Pulsar, etc. | ||
|  | 2. **MLModelServices** - ML platform services | ||
|  | 3. **ObjectStoreServices** - S3, GCS, Azure Blob | ||
|  | 4. **SearchServices** - Elasticsearch, OpenSearch | ||
|  | 5. **ApiServices** - REST API services | ||
|  | 6. **DriveServices** - Google Drive, SharePoint | ||
|  | 
 | ||
|  | ### Missing Governance Features
 | ||
|  | 1. **Classifications** - Data classification tags | ||
|  | 2. **Domains** - Business domain management | ||
|  | 3. **DataContracts** - Data contract definitions | ||
|  | 4. **TestCases** - Individual test case management | ||
|  | 5. **TestSuites** - Test suite orchestration | ||
|  | 6. **TestDefinitions** - Test definition templates | ||
|  | 
 | ||
|  | ### Missing Security & Access Features
 | ||
|  | 1. **Roles** - Role management API | ||
|  | 2. **Policies** - Policy definition and enforcement | ||
|  | 3. **Bots** - Bot user management | ||
|  | 4. **Permissions** - Fine-grained permissions API | ||
|  | 5. **SecurityServices** - Security service integrations | ||
|  | 
 | ||
|  | ### Missing Operational Features
 | ||
|  | 1. **WorkflowDefinitions** - Workflow template management | ||
|  | 2. **WorkflowInstances** - Running workflow instances | ||
|  | 3. **WorkflowInstanceStates** - Workflow state tracking | ||
|  | 4. **Events** - Event stream management | ||
|  | 5. **Feeds** - Activity feed management | ||
|  | 
 | ||
|  | ### Missing Advanced Features
 | ||
|  | 1. **ApiCollections** - API collection management | ||
|  | 2. **Personas** - User persona definitions | ||
|  | 3. **Columns** - Column-level operations | ||
|  | 4. **ReportsBeta** - Reporting features | ||
|  | 5. **Rdf/RdfSql** - RDF data management | ||
|  | 6. **Scim** - SCIM user provisioning | ||
|  | 7. **TestCaseIncidentManager** - Incident management | ||
|  | 8. **QueryCostRecordManager** - Query cost tracking | ||
|  | 9. **Files/Directories** - File system management | ||
|  | 10. **Apps** - Application management | ||
|  | 
 | ||
|  | ## Recommendations
 | ||
|  | 
 | ||
|  | ### Priority 1: Core Data Assets (Q1 2025)
 | ||
|  | - Implement Chart, Metric, MLModel entity classes | ||
|  | - Add StoredProcedure and Query management | ||
|  | - Complete SearchIndex integration | ||
|  | 
 | ||
|  | ### Priority 2: Service Integration (Q2 2025)
 | ||
|  | - Add MessagingService support (Kafka, Pulsar) | ||
|  | - Implement ObjectStoreService (S3, GCS) | ||
|  | - Add MLModelService integrations | ||
|  | 
 | ||
|  | ### Priority 3: Governance & Quality (Q2 2025)
 | ||
|  | - Implement Classification and Domain APIs | ||
|  | - Add DataContract support | ||
|  | - Complete TestCase/TestSuite framework | ||
|  | 
 | ||
|  | ### Priority 4: Security & Operations (Q3 2025)
 | ||
|  | - Add Role and Policy management | ||
|  | - Implement Workflow APIs | ||
|  | - Add Event and Feed management | ||
|  | 
 | ||
|  | ### Priority 5: Advanced Features (Q4 2025)
 | ||
|  | - Add Persona management | ||
|  | - Implement SCIM support | ||
|  | - Add cost tracking features | ||
|  | 
 | ||
|  | ## Implementation Strategy
 | ||
|  | 
 | ||
|  | ### 1. Extend BaseEntity Pattern
 | ||
|  | All new entities should follow the established pattern: | ||
|  | ```python | ||
|  | class NewEntity(BaseEntity): | ||
|  |     @classmethod | ||
|  |     def entity_type(cls): | ||
|  |         return NewEntityClass | ||
|  |      | ||
|  |     @classmethod | ||
|  |     def create(cls, request: CreateNewEntityRequest): | ||
|  |         # Implementation | ||
|  |      | ||
|  |     @classmethod | ||
|  |     def retrieve(cls, entity_id: str, fields: List[str] = None): | ||
|  |         # Implementation | ||
|  | ``` | ||
|  | 
 | ||
|  | ### 2. Add Corresponding Mixins
 | ||
|  | For complex operations, add mixins to OMeta: | ||
|  | ```python | ||
|  | class NewEntityMixin: | ||
|  |     def get_new_entity_by_name(self, fqn: str): | ||
|  |         # Implementation | ||
|  | ``` | ||
|  | 
 | ||
|  | ### 3. Test Coverage
 | ||
|  | Each new entity needs comprehensive unit tests: | ||
|  | - Create operations | ||
|  | - Retrieve by ID/name | ||
|  | - Update/Patch operations | ||
|  | - Delete operations | ||
|  | - List operations | ||
|  | 
 | ||
|  | ### 4. Documentation
 | ||
|  | Update SDK documentation with: | ||
|  | - Usage examples | ||
|  | - API reference | ||
|  | - Migration guides from direct OMeta usage | ||
|  | 
 | ||
|  | ## Conclusion
 | ||
|  | 
 | ||
|  | The Python SDK currently covers approximately **30%** of the Java SDK's functionality. The main gaps are in: | ||
|  | 1. Advanced data asset types (Charts, Metrics, MLModels) | ||
|  | 2. Service integrations (Messaging, ML, Object Storage) | ||
|  | 3. Governance features (Classifications, Domains, Data Contracts) | ||
|  | 4. Operational APIs (Workflows, Events, Feeds) | ||
|  | 
 | ||
|  | Implementing the missing features would require: | ||
|  | - ~50 new entity classes | ||
|  | - ~20 additional OMeta mixins | ||
|  | - ~200+ unit tests | ||
|  | - Comprehensive documentation updates | ||
|  | 
 | ||
|  | The recommended approach is to prioritize based on user demand and implement in phases over 2025. |