From 9bbc676c3b43f1b5b718b6d07defc406e5b936aa Mon Sep 17 00:00:00 2001 From: pmbrull Date: Sun, 5 Dec 2021 16:38:58 +0000 Subject: [PATCH] GitBook: [#65] Python API --- docs/SUMMARY.md | 1 + .../open-source-community/developer/README.md | 4 + .../developer/python-api.md | 345 ++++++++++++++++++ 3 files changed, 350 insertions(+) create mode 100644 docs/open-source-community/developer/python-api.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 7e7622c1b0e..a3d0bee5849 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -134,3 +134,4 @@ * [UX Style Guide](open-source-community/developer/ux-style-guide.md) * [Generate Typescript Types From JSON Schema](open-source-community/developer/generate-typescript-types-from-json-schema.md) * [Solution Design](open-source-community/developer/solution-design.md) + * [Python API](open-source-community/developer/python-api.md) diff --git a/docs/open-source-community/developer/README.md b/docs/open-source-community/developer/README.md index d5e915e27f7..f403060dbfa 100644 --- a/docs/open-source-community/developer/README.md +++ b/docs/open-source-community/developer/README.md @@ -37,3 +37,7 @@ This document summarizes information relevant to OpenMetadata committers and con {% content-ref url="solution-design.md" %} [solution-design.md](solution-design.md) {% endcontent-ref %} + +{% content-ref url="python-api.md" %} +[python-api.md](python-api.md) +{% endcontent-ref %} diff --git a/docs/open-source-community/developer/python-api.md b/docs/open-source-community/developer/python-api.md new file mode 100644 index 00000000000..a69e3bc4729 --- /dev/null +++ b/docs/open-source-community/developer/python-api.md @@ -0,0 +1,345 @@ +--- +description: >- + We are now going to present a high-level Python API as a type-safe and gentle + wrapper for the OpenMetadata backend. +--- + +# Python API + +In the [Solution Design](solution-design.md), we have been dissecting the internals of OpenMetadata. The main conclusion here is twofold: + +* **Everything** is handled via the API, and +* **Data structures** (Entity definitions) are at the heart of the solution. + +This means that whenever we need to interact with the metadata system or develop a new connector or logic, we have to make sure that we pass the proper inputs and handle the types of outputs. + +## Introducing the Python API + +Let's suppose that we have our local OpenMetadata server running at `http:localhost:8585`. We can play with it with simple `cURL` or `httpie` commands, and if we just want to take a look at the Entity instances we have lying around, that might probably be enough. + +However, let's imagine that we want to create or update an ML Model Entity with a `PUT`. To do so, we need to make sure that we are providing a proper JSON, covering all the attributes and types required by the Entity definition. + +By reviewing the [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/api/data/createMlModel.json) for the create operation and the [fields definitions](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/data/mlmodel.json) of the Entity, we could come up with a rather simple description of a toy ML Model: + +```json +{ + "name": "my-model", + "description": "sample ML Model", + "algorithm": "regression", + "mlFeatures": [ + { + "name": "age", + "dataType": "numerical", + "featureSources": [ + { + "name": "age", + "dataType": "integer" + } + ] + }, + { + "name": "persona", + "dataType": "categorical", + "featureSources": [ + { + "name": "age", + "dataType": "integer" + }, + { + "name": "education", + "dataType": "string" + } + ], + "featureAlgorithm": "PCA" + } + ], + "mlHyperParameters": [ + { + "name": "regularisation", + "value": "0.5" + } + ] +} +``` + +If we needed to repeat this process with a full-fledged model that is built ad-hoc and updated during the CICD process, we would just be adding a hardly maintainable, error-prone requirement to our production deployment pipelines. + +The same would happen if, inside the actual OpenMetadata code, there was not a way to easily interact with the API and make sure that we send proper data and can safely process the outputs. + +## Using Generated Sources + +As OpenMetadata is a data-centric solution, we need to make sure we have the right ingredients at all times. That is why we have developed a high-level Python API, using `pydantic` models automatically generated from the JSON Schemas. + +> OBS: If you are using a [published](https://pypi.org/project/openmetadata-ingestion/) version of the Ingestion Framework, you are already good to go, as we package the code with the `metadata.generated` module. If you are developing a new feature, you can get more information [here](build-a-connector/setup.md). + +This API wrapper helps developers and consumers in: + +* Validating data during development and with specific error messages at runtime, +* Receiving typed responses to ease further processing. + +Thanks to the recursive model setting of `pydantic` the example above can be rewritten using only Python classes, and thus being able to get help from IDEs and the Python interpreter. We can rewrite the previous JSON as: + +```python +from metadata.generated.schema.api.data.createMlModel import CreateMlModelEntityRequest + +from metadata.generated.schema.entity.data.mlmodel import ( + FeatureSource, + FeatureSourceDataType, + FeatureType, + MlFeature, + MlHyperParameter, + MlModel, +) + +model = CreateMlModelEntityRequest( + name="test-model-properties", + algorithm="algo", + mlFeatures=[ + MlFeature( + name="age", + dataType=FeatureType.numerical, + featureSources=[ + FeatureSource( + name="age", + dataType=FeatureSourceDataType.integer, + ) + ], + ), + MlFeature( + name="persona", + dataType=FeatureType.categorical, + featureSources=[ + FeatureSource( + name="age", + dataType=FeatureSourceDataType.integer, + ), + FeatureSource( + name="education", + dataType=FeatureSourceDataType.string, + ), + ], + featureAlgorithm="PCA", + ), + ], + mlHyperParameters=[ + MlHyperParameter(name="regularisation", value="0.5"), + ], +) +``` + +## One syntax to rule them all + +Now that we know how to directly use the `pydantic` models, we can start showcasing the solution. This module has been built with two main principles in mind: + +* **Reusability**: We should be able to support existing and new entities with minimum effort, +* **Extensibility**: However, we are aware that not all Entities are the same. Some of them may require specific functionalities or slight variations (such as `Lineage` or `Location`), so it should be easy to identify those special methods and create new ones when needed. + +To this end, we have the main class `OpenMetadata` ([source](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/src/metadata/ingestion/ometa/ometa\_api.py)) based on Python's `TypeVar`. Thanks to this we can exploit the complete power of the `pydantic` models, having methods with Type Parameters that know how to respond to each Entity. + +At the same time, we have the Mixins ([source](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/src/metadata/ingestion/ometa/mixins)) module, with special extensions to some Entities. + +## Walkthrough + +Let's use Python's API to create, update and delete a `Table` Entity. Choosing the `Table` is a nice starter, as its attributes define the following hierarchy: + +``` +DatabaseService -> Database -> Table +``` + +This will help us showcase how we can reuse the same syntax with the three different Entities. + +### 1. Initialize OpenMetadata + +`OpenMetadata` is the class holding the connection to the API and handling the requests. We can instantiate this by passing the proper configuration to reach the server API: + +```python +from metadata.ingestion.ometa.ometa_api import OpenMetadata +from metadata.ingestion.ometa.openmetadata_rest import MetadataServerConfig + +server_config = MetadataServerConfig(api_endpoint="http://localhost:8585/api") +metadata = OpenMetadata(server_config) +``` + +As this is just using a local development, the `MetadataServerConfig` is rather simple. However, in there we would prepare settings such as `auth_provider_type` or `secret_key`. + +From this point onwards, we will interact with the API by using `OpenMetadata` methods. + +An interesting validation we can already make at this point is verifying that the service is reachable and healthy. To do so, we can validate the `Bool` output from: + +```python +metadata.health_check() # `True` means we are alright :) +``` + +### 2. Create the DatabaseService + +Following the hierarchy, we need to start by defining a `DatabaseService`. This will be system hosting our `Database`, which will contain the `Table`. + +Recall how we have mainly two types of models: + +* Entity definitions, such as `Table`, `MlModel` or `Topic` +* API definitions, useful when running a `PUT`, `POST` or `PATCH` request: `CreateTable`, `CreateMlModel` or `CreateTopic`. + +As we are just creating Entities right now, we'll stick to the `pydantic` models with the API definitions. + +Let's imagine that we are defining a MySQL: + +```python +from metadata.generated.schema.api.services.createDatabaseService import ( + CreateDatabaseServiceEntityRequest, +) +from metadata.generated.schema.entity.services.databaseService import ( + DatabaseService, + DatabaseServiceType, +) +from metadata.generated.schema.type.jdbcConnection import JdbcInfo + +create_service = CreateDatabaseServiceEntityRequest( + name="test-service-table", + serviceType=DatabaseServiceType.MySQL, + jdbc=JdbcInfo(driverClass="jdbc", connectionUrl="jdbc://localhost"), +) +``` + +Note how we can use both `String` definitions for the attributes, as well as specific types when possible, such as `serviceType=DatabaseServiceType.MySQL`. The less information we need to hardcode, the better. + +We can review the information that will be passed to the API by visiting the JSON definition of the class we just instantiated. As all these models are powered by `pydantic`, this conversion is transparent to us: + +```python +create_service.json() +# '{"name": "test-service-table", "description": null, "serviceType": "MySQL", "jdbc": {"driverClass": "jdbc", "connectionUrl": "jdbc://localhost"}, "ingestionSchedule": null}' +``` + +Executing the actual creation is easy! As our `create_service` variable already holds the proper datatype, there is a single line to execute: + +```python +service_entity = metadata.create_or_update(data=create_service) +``` + +Moreover, running a `create_or_update` will return us the Entity type, so we can explore its attributes easily: + +```python +type(service_entity) +#