OpenMetadata/ingestion/SDK_COVERAGE_REPORT.md
Sriharsha Chintalapani bb1395fc72
Implement Modern Fluent API Pattern for OpenMetadata Java Client (#23239)
* Implement Modern Fluent API Pattern for OpenMetadata Java Client

* Add Lineage, Bulk, Search static methods

* Add all API support for Java & Python SDKs

* Add Python SDKs and mock tests

* Add Fluent APIs for sdks

* Add Fluent APIs for sdks

* Add Fluent APIs for sdks, support async import/export

* Remove unnecessary scripts

* fix py checkstyle

* fix tests with new plural form sdks

* Fix tests

* remove examples from python sdk

* remove examples from python sdk

* Fix type check

* Fix pyformat check

* Fix pyformat check

* fix python integration tests

* fix pycheck and pytests

* fix search api pycheck

* fix pycheck

* fix pycheck

* fix pycheck

* Fix test_sdk_integration

* Improvements to SDK

* Remove SDK coverage for Python 3.9

* Remove SDK coverage for Python 3.9

* Remove SDK coverage for Python 3.9
2025-09-29 16:07:02 -07:00

7.1 KiB

OpenMetadata SDK Coverage Analysis Report

Executive Summary

This report compares the Java and Python SDK implementations to identify coverage gaps and missing features in the Python SDK.

Key Findings

  • Java SDK: 68 API client classes providing comprehensive coverage
  • Python SDK: 13 entity classes + 23 OMeta mixins providing partial coverage
  • Coverage Gap: Python SDK missing ~40+ API endpoints available in Java

Java SDK API Coverage (68 APIs)

Data Assets (16 APIs)

  • Tables
  • Databases
  • DatabaseSchemas
  • Containers
  • Topics
  • Dashboards
  • Charts
  • Pipelines
  • MlModels
  • Metrics
  • StoredProcedures
  • DashboardDataModels
  • SearchIndex
  • ApiEndpoint
  • Queries
  • Spreadsheets

Services (10 APIs)

  • DatabaseServices
  • DashboardServices
  • PipelineServices
  • MessagingServices
  • MlModelServices
  • MetadataServices
  • ObjectStoreServices
  • SearchServices
  • ApiServices
  • DriveServices

Governance & Quality (8 APIs)

  • Glossaries
  • Classifications
  • Domains
  • DataContracts
  • TestCases
  • TestSuites
  • TestDefinitions
  • TestCaseResults

Security & Access (7 APIs)

  • Users
  • Teams
  • Roles
  • Policies
  • Bots
  • Permissions
  • SecurityServices

Operations (8 APIs)

  • IngestionPipelines
  • WorkflowDefinitions
  • WorkflowInstances
  • WorkflowInstanceStates
  • Events
  • Feeds
  • Usage
  • Suggestions

Platform Features (8 APIs)

  • Lineage
  • Search
  • Metadata
  • System
  • DocumentStore
  • Files
  • Directories
  • Apps

Advanced Features (11 APIs)

  • ApiCollections
  • Personas
  • Columns
  • Worksheets
  • ReportsBeta
  • Rdf
  • RdfSql
  • Scim
  • TestCaseIncidentManager
  • QueryCostRecordManager
  • Default

Python SDK Coverage

Entity Classes (13 entities)

Table Database DatabaseSchema Dashboard Pipeline Container Topic Team User (user_improved) Glossary GlossaryTerm TableImproved ⚠️ BaseEntity (abstract base)

API Modules (3 modules)

Search Lineage Bulk

OMeta Mixins (23 mixins)

custom_property_mixin dashboard_mixin data_contract_mixin data_insight_mixin domain_mixin es_mixin ingestion_pipeline_mixin lineage_mixin mlmodel_mixin patch_mixin pipeline_mixin query_mixin role_policy_mixin search_index_mixin server_mixin service_mixin suggestions_mixin table_mixin tests_mixin topic_mixin user_mixin version_mixin

Coverage Gap Analysis

Missing Entity Classes in Python SDK (High Priority)

  1. Charts - Dashboard visualization components
  2. Metrics - Business metrics and KPIs
  3. MLModels - Machine learning model entities
  4. StoredProcedures - Database stored procedures
  5. DashboardDataModels - Dashboard data modeling
  6. SearchIndex - Search index management
  7. ApiEndpoint - API endpoint documentation
  8. Queries - SQL query management
  9. Spreadsheets - Spreadsheet assets
  10. Worksheets - Worksheet management

Missing Service Classes in Python SDK

  1. MessagingServices - Kafka, Pulsar, etc.
  2. MLModelServices - ML platform services
  3. ObjectStoreServices - S3, GCS, Azure Blob
  4. SearchServices - Elasticsearch, OpenSearch
  5. ApiServices - REST API services
  6. DriveServices - Google Drive, SharePoint

Missing Governance Features

  1. Classifications - Data classification tags
  2. Domains - Business domain management
  3. DataContracts - Data contract definitions
  4. TestCases - Individual test case management
  5. TestSuites - Test suite orchestration
  6. TestDefinitions - Test definition templates

Missing Security & Access Features

  1. Roles - Role management API
  2. Policies - Policy definition and enforcement
  3. Bots - Bot user management
  4. Permissions - Fine-grained permissions API
  5. SecurityServices - Security service integrations

Missing Operational Features

  1. WorkflowDefinitions - Workflow template management
  2. WorkflowInstances - Running workflow instances
  3. WorkflowInstanceStates - Workflow state tracking
  4. Events - Event stream management
  5. Feeds - Activity feed management

Missing Advanced Features

  1. ApiCollections - API collection management
  2. Personas - User persona definitions
  3. Columns - Column-level operations
  4. ReportsBeta - Reporting features
  5. Rdf/RdfSql - RDF data management
  6. Scim - SCIM user provisioning
  7. TestCaseIncidentManager - Incident management
  8. QueryCostRecordManager - Query cost tracking
  9. Files/Directories - File system management
  10. Apps - Application management

Recommendations

Priority 1: Core Data Assets (Q1 2025)

  • Implement Chart, Metric, MLModel entity classes
  • Add StoredProcedure and Query management
  • Complete SearchIndex integration

Priority 2: Service Integration (Q2 2025)

  • Add MessagingService support (Kafka, Pulsar)
  • Implement ObjectStoreService (S3, GCS)
  • Add MLModelService integrations

Priority 3: Governance & Quality (Q2 2025)

  • Implement Classification and Domain APIs
  • Add DataContract support
  • Complete TestCase/TestSuite framework

Priority 4: Security & Operations (Q3 2025)

  • Add Role and Policy management
  • Implement Workflow APIs
  • Add Event and Feed management

Priority 5: Advanced Features (Q4 2025)

  • Add Persona management
  • Implement SCIM support
  • Add cost tracking features

Implementation Strategy

1. Extend BaseEntity Pattern

All new entities should follow the established pattern:

class NewEntity(BaseEntity):
    @classmethod
    def entity_type(cls):
        return NewEntityClass
    
    @classmethod
    def create(cls, request: CreateNewEntityRequest):
        # Implementation
    
    @classmethod
    def retrieve(cls, entity_id: str, fields: List[str] = None):
        # Implementation

2. Add Corresponding Mixins

For complex operations, add mixins to OMeta:

class NewEntityMixin:
    def get_new_entity_by_name(self, fqn: str):
        # Implementation

3. Test Coverage

Each new entity needs comprehensive unit tests:

  • Create operations
  • Retrieve by ID/name
  • Update/Patch operations
  • Delete operations
  • List operations

4. Documentation

Update SDK documentation with:

  • Usage examples
  • API reference
  • Migration guides from direct OMeta usage

Conclusion

The Python SDK currently covers approximately 30% of the Java SDK's functionality. The main gaps are in:

  1. Advanced data asset types (Charts, Metrics, MLModels)
  2. Service integrations (Messaging, ML, Object Storage)
  3. Governance features (Classifications, Domains, Data Contracts)
  4. Operational APIs (Workflows, Events, Feeds)

Implementing the missing features would require:

  • ~50 new entity classes
  • ~20 additional OMeta mixins
  • ~200+ unit tests
  • Comprehensive documentation updates

The recommended approach is to prioritize based on user demand and implement in phases over 2025.