mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-11-03 03:59:12 +00:00
279 lines
7.1 KiB
Markdown
279 lines
7.1 KiB
Markdown
|
|
# OpenMetadata SDK Coverage Analysis Report
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
This report compares the Java and Python SDK implementations to identify coverage gaps and missing features in the Python SDK.
|
||
|
|
|
||
|
|
### Key Findings
|
||
|
|
- **Java SDK**: 68 API client classes providing comprehensive coverage
|
||
|
|
- **Python SDK**: 13 entity classes + 23 OMeta mixins providing partial coverage
|
||
|
|
- **Coverage Gap**: Python SDK missing ~40+ API endpoints available in Java
|
||
|
|
|
||
|
|
## Java SDK API Coverage (68 APIs)
|
||
|
|
|
||
|
|
### Data Assets (16 APIs)
|
||
|
|
- ✅ Tables
|
||
|
|
- ✅ Databases
|
||
|
|
- ✅ DatabaseSchemas
|
||
|
|
- ✅ Containers
|
||
|
|
- ✅ Topics
|
||
|
|
- ✅ Dashboards
|
||
|
|
- ✅ Charts
|
||
|
|
- ✅ Pipelines
|
||
|
|
- ✅ MlModels
|
||
|
|
- ✅ Metrics
|
||
|
|
- ✅ StoredProcedures
|
||
|
|
- ✅ DashboardDataModels
|
||
|
|
- ✅ SearchIndex
|
||
|
|
- ✅ ApiEndpoint
|
||
|
|
- ✅ Queries
|
||
|
|
- ✅ Spreadsheets
|
||
|
|
|
||
|
|
### Services (10 APIs)
|
||
|
|
- ✅ DatabaseServices
|
||
|
|
- ✅ DashboardServices
|
||
|
|
- ✅ PipelineServices
|
||
|
|
- ✅ MessagingServices
|
||
|
|
- ✅ MlModelServices
|
||
|
|
- ✅ MetadataServices
|
||
|
|
- ✅ ObjectStoreServices
|
||
|
|
- ✅ SearchServices
|
||
|
|
- ✅ ApiServices
|
||
|
|
- ✅ DriveServices
|
||
|
|
|
||
|
|
### Governance & Quality (8 APIs)
|
||
|
|
- ✅ Glossaries
|
||
|
|
- ✅ Classifications
|
||
|
|
- ✅ Domains
|
||
|
|
- ✅ DataContracts
|
||
|
|
- ✅ TestCases
|
||
|
|
- ✅ TestSuites
|
||
|
|
- ✅ TestDefinitions
|
||
|
|
- ✅ TestCaseResults
|
||
|
|
|
||
|
|
### Security & Access (7 APIs)
|
||
|
|
- ✅ Users
|
||
|
|
- ✅ Teams
|
||
|
|
- ✅ Roles
|
||
|
|
- ✅ Policies
|
||
|
|
- ✅ Bots
|
||
|
|
- ✅ Permissions
|
||
|
|
- ✅ SecurityServices
|
||
|
|
|
||
|
|
### Operations (8 APIs)
|
||
|
|
- ✅ IngestionPipelines
|
||
|
|
- ✅ WorkflowDefinitions
|
||
|
|
- ✅ WorkflowInstances
|
||
|
|
- ✅ WorkflowInstanceStates
|
||
|
|
- ✅ Events
|
||
|
|
- ✅ Feeds
|
||
|
|
- ✅ Usage
|
||
|
|
- ✅ Suggestions
|
||
|
|
|
||
|
|
### Platform Features (8 APIs)
|
||
|
|
- ✅ Lineage
|
||
|
|
- ✅ Search
|
||
|
|
- ✅ Metadata
|
||
|
|
- ✅ System
|
||
|
|
- ✅ DocumentStore
|
||
|
|
- ✅ Files
|
||
|
|
- ✅ Directories
|
||
|
|
- ✅ Apps
|
||
|
|
|
||
|
|
### Advanced Features (11 APIs)
|
||
|
|
- ✅ ApiCollections
|
||
|
|
- ✅ Personas
|
||
|
|
- ✅ Columns
|
||
|
|
- ✅ Worksheets
|
||
|
|
- ✅ ReportsBeta
|
||
|
|
- ✅ Rdf
|
||
|
|
- ✅ RdfSql
|
||
|
|
- ✅ Scim
|
||
|
|
- ✅ TestCaseIncidentManager
|
||
|
|
- ✅ QueryCostRecordManager
|
||
|
|
- ✅ Default
|
||
|
|
|
||
|
|
## Python SDK Coverage
|
||
|
|
|
||
|
|
### Entity Classes (13 entities)
|
||
|
|
✅ Table
|
||
|
|
✅ Database
|
||
|
|
✅ DatabaseSchema
|
||
|
|
✅ Dashboard
|
||
|
|
✅ Pipeline
|
||
|
|
✅ Container
|
||
|
|
✅ Topic
|
||
|
|
✅ Team
|
||
|
|
✅ User (user_improved)
|
||
|
|
✅ Glossary
|
||
|
|
✅ GlossaryTerm
|
||
|
|
✅ TableImproved
|
||
|
|
⚠️ BaseEntity (abstract base)
|
||
|
|
|
||
|
|
### API Modules (3 modules)
|
||
|
|
✅ Search
|
||
|
|
✅ Lineage
|
||
|
|
✅ Bulk
|
||
|
|
|
||
|
|
### OMeta Mixins (23 mixins)
|
||
|
|
✅ custom_property_mixin
|
||
|
|
✅ dashboard_mixin
|
||
|
|
✅ data_contract_mixin
|
||
|
|
✅ data_insight_mixin
|
||
|
|
✅ domain_mixin
|
||
|
|
✅ es_mixin
|
||
|
|
✅ ingestion_pipeline_mixin
|
||
|
|
✅ lineage_mixin
|
||
|
|
✅ mlmodel_mixin
|
||
|
|
✅ patch_mixin
|
||
|
|
✅ pipeline_mixin
|
||
|
|
✅ query_mixin
|
||
|
|
✅ role_policy_mixin
|
||
|
|
✅ search_index_mixin
|
||
|
|
✅ server_mixin
|
||
|
|
✅ service_mixin
|
||
|
|
✅ suggestions_mixin
|
||
|
|
✅ table_mixin
|
||
|
|
✅ tests_mixin
|
||
|
|
✅ topic_mixin
|
||
|
|
✅ user_mixin
|
||
|
|
✅ version_mixin
|
||
|
|
|
||
|
|
## Coverage Gap Analysis
|
||
|
|
|
||
|
|
### Missing Entity Classes in Python SDK (High Priority)
|
||
|
|
1. **Charts** - Dashboard visualization components
|
||
|
|
2. **Metrics** - Business metrics and KPIs
|
||
|
|
3. **MLModels** - Machine learning model entities
|
||
|
|
4. **StoredProcedures** - Database stored procedures
|
||
|
|
5. **DashboardDataModels** - Dashboard data modeling
|
||
|
|
6. **SearchIndex** - Search index management
|
||
|
|
7. **ApiEndpoint** - API endpoint documentation
|
||
|
|
8. **Queries** - SQL query management
|
||
|
|
9. **Spreadsheets** - Spreadsheet assets
|
||
|
|
10. **Worksheets** - Worksheet management
|
||
|
|
|
||
|
|
### Missing Service Classes in Python SDK
|
||
|
|
1. **MessagingServices** - Kafka, Pulsar, etc.
|
||
|
|
2. **MLModelServices** - ML platform services
|
||
|
|
3. **ObjectStoreServices** - S3, GCS, Azure Blob
|
||
|
|
4. **SearchServices** - Elasticsearch, OpenSearch
|
||
|
|
5. **ApiServices** - REST API services
|
||
|
|
6. **DriveServices** - Google Drive, SharePoint
|
||
|
|
|
||
|
|
### Missing Governance Features
|
||
|
|
1. **Classifications** - Data classification tags
|
||
|
|
2. **Domains** - Business domain management
|
||
|
|
3. **DataContracts** - Data contract definitions
|
||
|
|
4. **TestCases** - Individual test case management
|
||
|
|
5. **TestSuites** - Test suite orchestration
|
||
|
|
6. **TestDefinitions** - Test definition templates
|
||
|
|
|
||
|
|
### Missing Security & Access Features
|
||
|
|
1. **Roles** - Role management API
|
||
|
|
2. **Policies** - Policy definition and enforcement
|
||
|
|
3. **Bots** - Bot user management
|
||
|
|
4. **Permissions** - Fine-grained permissions API
|
||
|
|
5. **SecurityServices** - Security service integrations
|
||
|
|
|
||
|
|
### Missing Operational Features
|
||
|
|
1. **WorkflowDefinitions** - Workflow template management
|
||
|
|
2. **WorkflowInstances** - Running workflow instances
|
||
|
|
3. **WorkflowInstanceStates** - Workflow state tracking
|
||
|
|
4. **Events** - Event stream management
|
||
|
|
5. **Feeds** - Activity feed management
|
||
|
|
|
||
|
|
### Missing Advanced Features
|
||
|
|
1. **ApiCollections** - API collection management
|
||
|
|
2. **Personas** - User persona definitions
|
||
|
|
3. **Columns** - Column-level operations
|
||
|
|
4. **ReportsBeta** - Reporting features
|
||
|
|
5. **Rdf/RdfSql** - RDF data management
|
||
|
|
6. **Scim** - SCIM user provisioning
|
||
|
|
7. **TestCaseIncidentManager** - Incident management
|
||
|
|
8. **QueryCostRecordManager** - Query cost tracking
|
||
|
|
9. **Files/Directories** - File system management
|
||
|
|
10. **Apps** - Application management
|
||
|
|
|
||
|
|
## Recommendations
|
||
|
|
|
||
|
|
### Priority 1: Core Data Assets (Q1 2025)
|
||
|
|
- Implement Chart, Metric, MLModel entity classes
|
||
|
|
- Add StoredProcedure and Query management
|
||
|
|
- Complete SearchIndex integration
|
||
|
|
|
||
|
|
### Priority 2: Service Integration (Q2 2025)
|
||
|
|
- Add MessagingService support (Kafka, Pulsar)
|
||
|
|
- Implement ObjectStoreService (S3, GCS)
|
||
|
|
- Add MLModelService integrations
|
||
|
|
|
||
|
|
### Priority 3: Governance & Quality (Q2 2025)
|
||
|
|
- Implement Classification and Domain APIs
|
||
|
|
- Add DataContract support
|
||
|
|
- Complete TestCase/TestSuite framework
|
||
|
|
|
||
|
|
### Priority 4: Security & Operations (Q3 2025)
|
||
|
|
- Add Role and Policy management
|
||
|
|
- Implement Workflow APIs
|
||
|
|
- Add Event and Feed management
|
||
|
|
|
||
|
|
### Priority 5: Advanced Features (Q4 2025)
|
||
|
|
- Add Persona management
|
||
|
|
- Implement SCIM support
|
||
|
|
- Add cost tracking features
|
||
|
|
|
||
|
|
## Implementation Strategy
|
||
|
|
|
||
|
|
### 1. Extend BaseEntity Pattern
|
||
|
|
All new entities should follow the established pattern:
|
||
|
|
```python
|
||
|
|
class NewEntity(BaseEntity):
|
||
|
|
@classmethod
|
||
|
|
def entity_type(cls):
|
||
|
|
return NewEntityClass
|
||
|
|
|
||
|
|
@classmethod
|
||
|
|
def create(cls, request: CreateNewEntityRequest):
|
||
|
|
# Implementation
|
||
|
|
|
||
|
|
@classmethod
|
||
|
|
def retrieve(cls, entity_id: str, fields: List[str] = None):
|
||
|
|
# Implementation
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Add Corresponding Mixins
|
||
|
|
For complex operations, add mixins to OMeta:
|
||
|
|
```python
|
||
|
|
class NewEntityMixin:
|
||
|
|
def get_new_entity_by_name(self, fqn: str):
|
||
|
|
# Implementation
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Test Coverage
|
||
|
|
Each new entity needs comprehensive unit tests:
|
||
|
|
- Create operations
|
||
|
|
- Retrieve by ID/name
|
||
|
|
- Update/Patch operations
|
||
|
|
- Delete operations
|
||
|
|
- List operations
|
||
|
|
|
||
|
|
### 4. Documentation
|
||
|
|
Update SDK documentation with:
|
||
|
|
- Usage examples
|
||
|
|
- API reference
|
||
|
|
- Migration guides from direct OMeta usage
|
||
|
|
|
||
|
|
## Conclusion
|
||
|
|
|
||
|
|
The Python SDK currently covers approximately **30%** of the Java SDK's functionality. The main gaps are in:
|
||
|
|
1. Advanced data asset types (Charts, Metrics, MLModels)
|
||
|
|
2. Service integrations (Messaging, ML, Object Storage)
|
||
|
|
3. Governance features (Classifications, Domains, Data Contracts)
|
||
|
|
4. Operational APIs (Workflows, Events, Feeds)
|
||
|
|
|
||
|
|
Implementing the missing features would require:
|
||
|
|
- ~50 new entity classes
|
||
|
|
- ~20 additional OMeta mixins
|
||
|
|
- ~200+ unit tests
|
||
|
|
- Comprehensive documentation updates
|
||
|
|
|
||
|
|
The recommended approach is to prioritize based on user demand and implement in phases over 2025.
|