datahub/docs/api/tutorials/custom-assertions.md

8.0 KiB

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Custom Assertions

This guide specifically covers how to create and report results for custom assertions in DataHub. Custom Assertions are those not natively run or directly modeled by DataHub, and managed by a 3rd party framework or tool.

To create native assertions using the API (e.g. for DataHub to manage), please refer to the Assertions API.

This guide may be used as reference for partners seeking to integrate their own monitoring tools with DataHub.

Goal Of This Guide

In this guide, you will learn how to

  1. Create and update custom assertions via GraphQL and Python APIs
  2. Report results for custom assertions via GraphQL and Python APIs
  3. Retrieve results for custom assertions via GraphQL and Python APIs
  4. Delete custom assertions via GraphQL and Python APIs

Prerequisites

The actor making API calls must have the Edit Assertions and Edit Monitors privileges for the Tables being monitored.

Create And Update Custom Assertions

You may create custom assertions using the following APIs for a Dataset in DataHub.

To create a new assertion, use the upsertCustomAssertion GraphQL Mutation. This mutation both allows you to create and update a given assertion.

mutation upsertCustomAssertion {
  upsertCustomAssertion(
    urn: "urn:li:assertion:my-custom-assertion-id" # Optional: if you want to provide a custom id. If not, one will be generated for you.
    input: {
      entityUrn: "<urn of entity being monitored>"
      type: "My Custom Category" # This is how your assertion will appear categorized in DataHub.
      description: "The description of my external assertion for my dataset"
      platform: {
        urn: "urn:li:dataPlatform:great-expectations" # OR you can provide name: "My Custom Platform" if you do not have an URN for the platform.
      }
      fieldPath: "field_foo" # Optional: if you want to associated with a specific field,
      externalUrl: "https://my-monitoring-tool.com/result-for-this-assertion" # Optional: if you want to provide a link to the monitoring tool
      # Optional: If you want to provide a custom SQL query for the assertion. This will be rendered as a query in the UI.
      # logic: "SELECT * FROM X WHERE Y"
    }
  ) {
    urn
  }
}

Note that you can either provide a unique urn for the assertion, which will be used to generate the corresponding assertion urn in the following format:

urn:li:assertion:<your-new-assertion-id>

or a random urn will be created and returned for you. This id should be stable over time and unique for each assertion.

The upsert API will return the unique identifier (URN) for the the assertion if you were successful:

{
  "data": {
    "upsertExternalAssertion": {
      "urn": "urn:li:assertion:your-new-assertion-id"
    }
  },
  "extensions": {}
}

To upsert an assertion in Python, simply use the upsert_external_assertion method on the DataHub Client object.

{{ inline /metadata-ingestion/examples/library/upsert_custom_assertion.py show_path_as_comment }}

Report Results For Custom Assertions

When an assertion is evaluated against a Dataset, or a new result is available, you can report the result to DataHub using the following APIs.

Once reported, these will appear in the evaluation history of the assertion and will be used to determine whether the assertion is displayed as passing or failing in the DataHub UI.

To report results for a custom, use the reportAssertionResult GraphQL Mutation. This mutation both allows you to create and update a given assertion.

mutation reportAssertionResult {
  reportAssertionResult(
    urn: "urn:li:assertion:<your-new-assertion-id>"
    result: {
      timestampMillis: 1620000000 # Unix timestamp in millis. If not provided, the current time will be used.
      type: SUCCESS # or FAILURE or ERROR or INIT
      properties: [{ key: "my_custom_key", value: "my_custom_value" }]
      externalUrl: "https://my-great-expectations.com/results/1234" # Optional: URL to the results in the external tool
      # Optional: If the type is ERROR, you can provide additional context. See full list of error types below.
      # error: {
      #    type: UNKNOWN_ERROR,
      #    message: "The assertion failed due to an unknown error"
      # }
    }
  )
}

The type field is used to communicate the latest health status of the assertion.

The properties field is used to provide additional key-value pair context that will be displayed alongside the result in DataHub's UI.

The full list of supported error types include:

  • SOURCE_CONNECTION_ERROR
  • SOURCE_QUERY_FAILED
  • INSUFFICIENT_DATA
  • INVALID_PARAMETERS
  • INVALID_SOURCE_TYPE
  • UNSUPPORTED_PLATFORM
  • CUSTOM_SQL_ERROR
  • FIELD_ASSERTION_ERROR
  • UNKNOWN_ERROR
{
  "data": {
    "reportAssertionResult": true
  },
  "extensions": {}
}

If the result is true, the result was successfully reported.

To report an assertion result in Python, simply use the report_assertion_result method on the DataHub Client object.

{{ inline /metadata-ingestion/examples/library/report_assertion_result.py show_path_as_comment }}

Retrieve Results For Custom Assertions

After an assertion has been created and run, it will appear in the set of assertions associated with a given dataset urn. You can retrieve the results of these assertions using the following APIs.

Get Assertions for Dataset

To retrieve all the assertions for a table / dataset, you can use the following GraphQL Query.

query dataset {
  dataset(
    urn: "urn:li:dataset:(urn:li:dataPlatform:snowflake,purchases,PROD)"
  ) {
    assertions(start: 0, count: 1000) {
      start
      count
      total
      assertions {
        urn
        # Fetch the last run of each associated assertion.
        runEvents(status: COMPLETE, limit: 1) {
          total
          failed
          succeeded
          runEvents {
            timestampMillis
            status
            result {
              type
              nativeResults {
                key
                value
              }
            }
          }
        }
        info {
          type # Will be CUSTOM
          customType # Will be your custom type.
          description
          lastUpdated {
            time
            actor
          }
          customAssertion {
            entityUrn
            fieldPath
            externalUrl
            logic
          }
          source {
            type
            created {
              time
              actor
            }
          }
        }
      }
    }
  }
}

Get Assertion Details

You can use the following GraphQL query to fetch the details for an assertion along with its evaluation history by URN.

query getAssertion {
  assertion(urn: "urn:li:assertion:my-custom-assertion-id") {
    urn
    # Fetch the last 10 runs for the assertion.
    runEvents(status: COMPLETE, limit: 10) {
      total
      failed
      succeeded
      runEvents {
        timestampMillis
        status
        result {
          type
          nativeResults {
            key
            value
          }
        }
      }
    }
    info {
      type # Will be CUSTOM
      customType # Will be your custom type.
      description
      lastUpdated {
        time
        actor
      }
      customAssertion {
        entityUrn
        fieldPath
        externalUrl
        logic
      }
      source {
        type
        created {
          time
          actor
        }
      }
    }
    # Fetch what entities have the assertion attached to it
    relationships(input: { types: ["Asserts"], direction: OUTGOING }) {
      total
      relationships {
        entity {
          urn
        }
      }
    }
  }
}