Hyejin Yoon 25450ac82c
feat(docs): add guide on integration ML system via SDKs (#8029)
Co-authored-by: socar-dini <dini@socar.kr>
2023-05-17 10:21:39 +09:00

11 KiB

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

ML System

Why Would You Integrate ML System with DataHub?

Machine learning systems have become a crucial feature in modern data stacks. However, the relationships between the different components of a machine learning system, such as features, models, and feature tables, can be complex. Thus, it is essential for these systems to be discoverable to facilitate easy access and utilization by other members of the organization.

For more information on ML entities, please refer to the following docs:

Goal Of This Guide

This guide will show you how to

  • Create ML entities: MlFeature, MlFeatureTable, MlModel, MlModelGroup
  • Read ML entities: MlFeature, MlFeatureTable, MlModel, MlModelGroup
  • Attach MlFeatureTable or MlModel to MlFeature

Prerequisites

For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. For detailed steps, please refer to Datahub Quickstart Guide.

Create ML Entities

Create MlFeature

{{ inline /metadata-ingestion/examples/library/create_mlfeature.py show_path_as_comment }}

Note that when creating a feature, you can access a list of data sources using sources.

Create MlFeatureTable

{{ inline /metadata-ingestion/examples/library/create_mlfeature_table.py show_path_as_comment }}

Note that when creating a feature table, you can access a list of features using mlFeatures.

Create MlModel

Please note that an MlModel represents the outcome of a single training run for a model, not the collective results of all model runs.

{{ inline /metadata-ingestion/examples/library/create_mlmodel.py show_path_as_comment }}

Note that when creating a model, you can access a list of features using mlFeatures. Additionally, you can access the relationship to model groups with groups.

Create MlModelGroup

Please note that an MlModelGroup serves as a container for all the runs of a single ML model.

{{ inline /metadata-ingestion/examples/library/create_mlmodel_group.py show_path_as_comment }}

Expected Outcome of creating entities

You can search the entities in DataHub UI.

feature-table-created

model-group-created

Read ML Entities

Read MLFeature

query {
  mlFeature(urn: "urn:li:mlFeature:(test_feature_table_all_feature_dtypes,test_BOOL_LIST_feature)"){
    name
    featureNamespace
    description
    properties {
      description
      dataType
      version {
        versionTag
      }
    }
  }
}

Expected response:

{
  "data": {
    "mlFeature": {
      "name": "test_BOOL_LIST_feature",
      "featureNamespace": "test_feature_table_all_feature_dtypes",
      "description": null,
      "properties": {
        "description": null,
        "dataType": "SEQUENCE",
        "version": null
      }
    }
  },
  "extensions": {}
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query": "{ mlFeature(urn: \"urn:li:mlFeature:(test_feature_table_all_feature_dtypes,test_BOOL_LIST_feature)\") { name featureNamespace description properties { description dataType version { versionTag } } } }"
}'

Expected response:

{
  "data": {
    "mlFeature": {
      "name": "test_BOOL_LIST_feature",
      "featureNamespace": "test_feature_table_all_feature_dtypes",
      "description": null,
      "properties": {
        "description": null,
        "dataType": "SEQUENCE",
        "version": null
      }
    }
  },
  "extensions": {}
}
{{ inline /metadata-ingestion/examples/library/read_mlfeature.py show_path_as_comment }}

Read MLFeatureTable

query {
  mlFeatureTable(urn: "urn:li:mlFeatureTable:(urn:li:dataPlatform:feast,test_feature_table_all_feature_dtypes)"){
    name
    description
    platform {
      name
    }
    properties {
      description
      mlFeatures {
        name
      }
    }
  }
}

Expected Response:

{
  "data": {
    "mlFeatureTable": {
      "name": "test_feature_table_all_feature_dtypes",
      "description": null,
      "platform": {
        "name": "feast"
      },
      "properties": {
        "description": null,
        "mlFeatures": [
          {
            "name": "test_BOOL_LIST_feature"
          },
          ...
          {
            "name": "test_STRING_feature"
          }
        ]
      }
    }
  },
  "extensions": {}
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query": "{ mlFeatureTable(urn: \"urn:li:mlFeatureTable:(urn:li:dataPlatform:feast,test_feature_table_all_feature_dtypes)\") { name description platform { name } properties { description mlFeatures { name } } } }"
}'

Expected Response:

{
  "data": {
    "mlFeatureTable": {
      "name": "test_feature_table_all_feature_dtypes",
      "description": null,
      "platform": {
        "name": "feast"
      },
      "properties": {
        "description": null,
        "mlFeatures": [
          {
            "name": "test_BOOL_LIST_feature"
          },
          ...
          {
            "name": "test_STRING_feature"
          }
        ]
      }
    }
  },
  "extensions": {}
}
{{ inline /metadata-ingestion/examples/library/read_mlfeature_table.py show_path_as_comment }}

Read MLModel

query {
  mlModel(urn: "urn:li:mlModel:(urn:li:dataPlatform:science,scienceModel,PROD)"){
    name
    description
    properties {
      description
      version
      type
      mlFeatures
      groups {
        urn
        name
      }
    }
  }
}

Expected Response:

{
  "data": {
    "mlModel": {
      "name": "scienceModel",
      "description": "A sample model for predicting some outcome.",
      "properties": {
        "description": "A sample model for predicting some outcome.",
        "version": null,
        "type": "Naive Bayes classifier",
        "mlFeatures": null,
        "groups": []
      }
    }
  },
  "extensions": {}
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query": "{ mlModel(urn: \"urn:li:mlModel:(urn:li:dataPlatform:science,scienceModel,PROD)\") { name description properties { description version type mlFeatures groups { urn name } } } }"
}'

Expected Response:

{
  "data": {
    "mlModel": {
      "name": "scienceModel",
      "description": "A sample model for predicting some outcome.",
      "properties": {
        "description": "A sample model for predicting some outcome.",
        "version": null,
        "type": "Naive Bayes classifier",
        "mlFeatures": null,
        "groups": []
      }
    }
  },
  "extensions": {}
}
{{ inline /metadata-ingestion/examples/library/read_mlmodel.py show_path_as_comment }}

Read MLModelGroup

query {
  mlModelGroup(urn: "urn:li:mlModelGroup:(urn:li:dataPlatform:science,my-model-group,PROD)"){
    name
    description
    platform {
      name
    }
    properties {
      description
    }
  }
}

Expected Response: (Note that this entity does not exist in the sample ingestion and you might want to create this entity first.)

{
  "data": {
    "mlModelGroup": {
      "name": "my-model-group",
      "description": "my model group",
      "platform": {
        "name": "science"
      },
      "properties": {
        "description": "my model group"
      }
    }
  },
  "extensions": {}
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query": "{ mlModelGroup(urn: \"urn:li:mlModelGroup:(urn:li:dataPlatform:science,my-model-group,PROD)\") { name description platform { name } properties { description } } }"
}'

Expected Response: (Note that this entity does not exist in the sample ingestion and you might want to create this entity first.)

{
  "data": {
    "mlModelGroup": {
      "name": "my-model-group",
      "description": "my model group",
      "platform": {
        "name": "science"
      },
      "properties": {
        "description": "my model group"
      }
    }
  },
  "extensions": {}
}
{{ inline /metadata-ingestion/examples/library/read_mlmodel_group.py show_path_as_comment }}

Add ML Entities

Add MlFeature to MlFeatureTable

{{ inline /metadata-ingestion/examples/library/add_mlfeature_to_mlfeature_table.py show_path_as_comment }}

Add MlFeature to MLModel

{{ inline /metadata-ingestion/examples/library/add_mlfeature_to_mlmodel.py show_path_as_comment }}

Add MLGroup To MLModel

{{ inline /metadata-ingestion/examples/library/add_mlgroup_to_mlmodel.py show_path_as_comment }}

Expected Outcome of Adding ML Entities

You can access to Features or Group Tab of each entity to view the added entities.

feature-added-to-model

model-group-added-to-model