* Implement Modern Fluent API Pattern for OpenMetadata Java Client * Add Lineage, Bulk, Search static methods * Add all API support for Java & Python SDKs * Add Python SDKs and mock tests * Add Fluent APIs for sdks * Add Fluent APIs for sdks * Add Fluent APIs for sdks, support async import/export * Remove unnecessary scripts * fix py checkstyle * fix tests with new plural form sdks * Fix tests * remove examples from python sdk * remove examples from python sdk * Fix type check * Fix pyformat check * Fix pyformat check * fix python integration tests * fix pycheck and pytests * fix search api pycheck * fix pycheck * fix pycheck * fix pycheck * Fix test_sdk_integration * Improvements to SDK * Remove SDK coverage for Python 3.9 * Remove SDK coverage for Python 3.9 * Remove SDK coverage for Python 3.9
7.1 KiB
OpenMetadata SDK Coverage Analysis Report
Executive Summary
This report compares the Java and Python SDK implementations to identify coverage gaps and missing features in the Python SDK.
Key Findings
- Java SDK: 68 API client classes providing comprehensive coverage
- Python SDK: 13 entity classes + 23 OMeta mixins providing partial coverage
- Coverage Gap: Python SDK missing ~40+ API endpoints available in Java
Java SDK API Coverage (68 APIs)
Data Assets (16 APIs)
- ✅ Tables
- ✅ Databases
- ✅ DatabaseSchemas
- ✅ Containers
- ✅ Topics
- ✅ Dashboards
- ✅ Charts
- ✅ Pipelines
- ✅ MlModels
- ✅ Metrics
- ✅ StoredProcedures
- ✅ DashboardDataModels
- ✅ SearchIndex
- ✅ ApiEndpoint
- ✅ Queries
- ✅ Spreadsheets
Services (10 APIs)
- ✅ DatabaseServices
- ✅ DashboardServices
- ✅ PipelineServices
- ✅ MessagingServices
- ✅ MlModelServices
- ✅ MetadataServices
- ✅ ObjectStoreServices
- ✅ SearchServices
- ✅ ApiServices
- ✅ DriveServices
Governance & Quality (8 APIs)
- ✅ Glossaries
- ✅ Classifications
- ✅ Domains
- ✅ DataContracts
- ✅ TestCases
- ✅ TestSuites
- ✅ TestDefinitions
- ✅ TestCaseResults
Security & Access (7 APIs)
- ✅ Users
- ✅ Teams
- ✅ Roles
- ✅ Policies
- ✅ Bots
- ✅ Permissions
- ✅ SecurityServices
Operations (8 APIs)
- ✅ IngestionPipelines
- ✅ WorkflowDefinitions
- ✅ WorkflowInstances
- ✅ WorkflowInstanceStates
- ✅ Events
- ✅ Feeds
- ✅ Usage
- ✅ Suggestions
Platform Features (8 APIs)
- ✅ Lineage
- ✅ Search
- ✅ Metadata
- ✅ System
- ✅ DocumentStore
- ✅ Files
- ✅ Directories
- ✅ Apps
Advanced Features (11 APIs)
- ✅ ApiCollections
- ✅ Personas
- ✅ Columns
- ✅ Worksheets
- ✅ ReportsBeta
- ✅ Rdf
- ✅ RdfSql
- ✅ Scim
- ✅ TestCaseIncidentManager
- ✅ QueryCostRecordManager
- ✅ Default
Python SDK Coverage
Entity Classes (13 entities)
✅ Table ✅ Database ✅ DatabaseSchema ✅ Dashboard ✅ Pipeline ✅ Container ✅ Topic ✅ Team ✅ User (user_improved) ✅ Glossary ✅ GlossaryTerm ✅ TableImproved ⚠️ BaseEntity (abstract base)
API Modules (3 modules)
✅ Search ✅ Lineage ✅ Bulk
OMeta Mixins (23 mixins)
✅ custom_property_mixin ✅ dashboard_mixin ✅ data_contract_mixin ✅ data_insight_mixin ✅ domain_mixin ✅ es_mixin ✅ ingestion_pipeline_mixin ✅ lineage_mixin ✅ mlmodel_mixin ✅ patch_mixin ✅ pipeline_mixin ✅ query_mixin ✅ role_policy_mixin ✅ search_index_mixin ✅ server_mixin ✅ service_mixin ✅ suggestions_mixin ✅ table_mixin ✅ tests_mixin ✅ topic_mixin ✅ user_mixin ✅ version_mixin
Coverage Gap Analysis
Missing Entity Classes in Python SDK (High Priority)
- Charts - Dashboard visualization components
- Metrics - Business metrics and KPIs
- MLModels - Machine learning model entities
- StoredProcedures - Database stored procedures
- DashboardDataModels - Dashboard data modeling
- SearchIndex - Search index management
- ApiEndpoint - API endpoint documentation
- Queries - SQL query management
- Spreadsheets - Spreadsheet assets
- Worksheets - Worksheet management
Missing Service Classes in Python SDK
- MessagingServices - Kafka, Pulsar, etc.
- MLModelServices - ML platform services
- ObjectStoreServices - S3, GCS, Azure Blob
- SearchServices - Elasticsearch, OpenSearch
- ApiServices - REST API services
- DriveServices - Google Drive, SharePoint
Missing Governance Features
- Classifications - Data classification tags
- Domains - Business domain management
- DataContracts - Data contract definitions
- TestCases - Individual test case management
- TestSuites - Test suite orchestration
- TestDefinitions - Test definition templates
Missing Security & Access Features
- Roles - Role management API
- Policies - Policy definition and enforcement
- Bots - Bot user management
- Permissions - Fine-grained permissions API
- SecurityServices - Security service integrations
Missing Operational Features
- WorkflowDefinitions - Workflow template management
- WorkflowInstances - Running workflow instances
- WorkflowInstanceStates - Workflow state tracking
- Events - Event stream management
- Feeds - Activity feed management
Missing Advanced Features
- ApiCollections - API collection management
- Personas - User persona definitions
- Columns - Column-level operations
- ReportsBeta - Reporting features
- Rdf/RdfSql - RDF data management
- Scim - SCIM user provisioning
- TestCaseIncidentManager - Incident management
- QueryCostRecordManager - Query cost tracking
- Files/Directories - File system management
- Apps - Application management
Recommendations
Priority 1: Core Data Assets (Q1 2025)
- Implement Chart, Metric, MLModel entity classes
- Add StoredProcedure and Query management
- Complete SearchIndex integration
Priority 2: Service Integration (Q2 2025)
- Add MessagingService support (Kafka, Pulsar)
- Implement ObjectStoreService (S3, GCS)
- Add MLModelService integrations
Priority 3: Governance & Quality (Q2 2025)
- Implement Classification and Domain APIs
- Add DataContract support
- Complete TestCase/TestSuite framework
Priority 4: Security & Operations (Q3 2025)
- Add Role and Policy management
- Implement Workflow APIs
- Add Event and Feed management
Priority 5: Advanced Features (Q4 2025)
- Add Persona management
- Implement SCIM support
- Add cost tracking features
Implementation Strategy
1. Extend BaseEntity Pattern
All new entities should follow the established pattern:
class NewEntity(BaseEntity):
@classmethod
def entity_type(cls):
return NewEntityClass
@classmethod
def create(cls, request: CreateNewEntityRequest):
# Implementation
@classmethod
def retrieve(cls, entity_id: str, fields: List[str] = None):
# Implementation
2. Add Corresponding Mixins
For complex operations, add mixins to OMeta:
class NewEntityMixin:
def get_new_entity_by_name(self, fqn: str):
# Implementation
3. Test Coverage
Each new entity needs comprehensive unit tests:
- Create operations
- Retrieve by ID/name
- Update/Patch operations
- Delete operations
- List operations
4. Documentation
Update SDK documentation with:
- Usage examples
- API reference
- Migration guides from direct OMeta usage
Conclusion
The Python SDK currently covers approximately 30% of the Java SDK's functionality. The main gaps are in:
- Advanced data asset types (Charts, Metrics, MLModels)
- Service integrations (Messaging, ML, Object Storage)
- Governance features (Classifications, Domains, Data Contracts)
- Operational APIs (Workflows, Events, Feeds)
Implementing the missing features would require:
- ~50 new entity classes
- ~20 additional OMeta mixins
- ~200+ unit tests
- Comprehensive documentation updates
The recommended approach is to prioritize based on user demand and implement in phases over 2025.