From 1b5d93dc778ea5ed38cd7c7d9d0077bf385fe762 Mon Sep 17 00:00:00 2001 From: Pere Miquel Brull Date: Mon, 12 Sep 2022 11:59:52 +0200 Subject: [PATCH] Docs - Add Airflow execution for test suite (#7381) * Add Airflow execution for test suite * Add types --- .../ingestion/workflows/data-quality/index.md | 84 ++++++++++++++++--- 1 file changed, 73 insertions(+), 11 deletions(-) diff --git a/openmetadata-docs/content/openmetadata/ingestion/workflows/data-quality/index.md b/openmetadata-docs/content/openmetadata/ingestion/workflows/data-quality/index.md index 6977e0e4a03..8edfbf180fb 100644 --- a/openmetadata-docs/content/openmetadata/ingestion/workflows/data-quality/index.md +++ b/openmetadata-docs/content/openmetadata/ingestion/workflows/data-quality/index.md @@ -92,7 +92,7 @@ After clicking `Add Ingestion` you will be able to select an execution schedule ## Adding Tests with the YAML Config When creating a JSON config for a test workflow the source configuration is very simple. -``` +```yaml source: type: TestSuite serviceName: @@ -103,7 +103,7 @@ source: The only section you need to modify here is the `serviceName` key. Note that this name needs to be unique across OM platform Test Suite name. Once you have defined your source configuration you'll need to define te processor configuration. -``` +```yaml processor: type: "orm-test-runner" config: @@ -127,7 +127,7 @@ The processor type should be set to ` "orm-test-runner"`. For accepted test defi ### Full `yaml` config example -``` +```yaml source: type: TestSuite serviceName: MyAwesomeTestSuite @@ -162,11 +162,73 @@ workflowConfig: ``` ### How to Run Tests + To run the tests from the CLI execute the following command ``` metadata test -c /path/to/my/config.yaml ``` +### Schedule Test Suite runs with Airflow + +As with the Ingestion or Profiler workflow, you can as well execute a Test Suite directly from Python. We are +going to use Airflow as an example, but any orchestrator would achieve the same goal. + +Let's prepare the DAG as usual, but importing a different Workflow class: + +```python +import pathlib +import yaml +from datetime import timedelta +from airflow import DAG + +try: + from airflow.operators.python import PythonOperator +except ModuleNotFoundError: + from airflow.operators.python_operator import PythonOperator + +from metadata.config.common import load_config_file +from metadata.test_suite.api.workflow import TestSuiteWorkflow +from airflow.utils.dates import days_ago + +default_args = { + "owner": "user_name", + "email": ["username@org.com"], + "email_on_failure": False, + "retries": 3, + "retry_delay": timedelta(minutes=5), + "execution_timeout": timedelta(minutes=60) +} + +config = """ + +""" + +def metadata_ingestion_workflow(): + workflow_config = yaml.safe_load(config) + workflow = TestSuiteWorkflow.create(workflow_config) + workflow.execute() + workflow.raise_from_status() + workflow.print_status() + workflow.stop() + +with DAG( + "test_suite_workflow", + default_args=default_args, + description="An example DAG which runs a OpenMetadata ingestion workflow", + start_date=days_ago(1), + is_paused_upon_creation=False, + schedule_interval='*/5 * * * *', + catchup=False, +) as dag: + ingest_task = PythonOperator( + task_id="test_using_recipe", + python_callable=metadata_ingestion_workflow, + ) +``` + +Note how we are using the `TestSuiteWorkflow` class to load and execute the tests based on the YAML +configurations specified above. + ## How to Visualize Test Results ### From the Test Suite View From the home page click on the Test Suite menu in the left pannel. @@ -213,7 +275,7 @@ While OpenMetadata provides out of the box tests, you may want to write your tes ### Creating a `TestDefinition` First, you'll need to create a Test Definition for your test. You can use the following endpoint `/api/v1/testDefinition` using a POST protocol to create your Test Definition. You will need to pass the following data in the body your request at minimum. -``` +```json { "description": "", "entityType": "", @@ -232,7 +294,7 @@ First, you'll need to create a Test Definition for your test. You can use the fo Here is a complete CURL request -``` +```bash curl --request POST 'http://localhost:8585/api/v1/testDefinition' \ --header 'Content-Type: application/json' \ --data-raw '{ @@ -251,7 +313,7 @@ Make sure to keep the `UUID` from the response as you will need it to create the ### Creating a `TestSuite` You'll also need to create a Test Suite for your Test Case -- note that you can also use an existing one if you want to. You can use the following endpoint `/api/v1/testSuite` using a POST protocol to create your Test Definition. You will need to pass the following data in the body your request at minimum. -``` +```json { "name": "", "description": "" @@ -260,7 +322,7 @@ You'll also need to create a Test Suite for your Test Case -- note that you can Here is a complete CURL request -``` +```bash curl --request POST 'http://localhost:8585/api/v1/testSuite' \ --header 'Content-Type: application/json' \ --data-raw '{ @@ -275,7 +337,7 @@ Make sure to keep the `UUID` from the response as you will need it to create the ### Creating a `TestCase` Once you have your Test Definition created you can create a Test Case -- which is a specification of your Test Definition. You can use the following endpoint `/api/v1/testCase` using a POST protocol to create your Test Case. You will need to pass the following data in the body your request at minimum. -``` +```json { "entityLink": "<#E::table::fqn> or <#E::table::fqn::columns::column name>", "name": "", @@ -293,7 +355,7 @@ Once you have your Test Definition created you can create a Test Case -- which i Here is a complete CURL request -``` +```bash curl --request POST 'http://localhost:8585/api/v1/testCase' \ --header 'Content-Type: application/json' \ --data-raw '{ @@ -322,7 +384,7 @@ Make sure to keep the `UUID` from the response as you will need it to create the ### Writing `TestCaseResults` Once you have your Test Case created you can write your results to it. You can use the following endpoint `/api/v1/testCase/{test FQN}/testCaseResult` using a PUT protocol to add Test Case Results. You will need to pass the following data in the body your request at minimum. -``` +```json { "result": "", "testCaseStatus": "", @@ -337,7 +399,7 @@ Once you have your Test Case created you can write your results to it. You can u Here is a complete CURL request -``` +```bash curl --location --request PUT 'http://localhost:8585/api/v1/testCase/local_redshift.dev.dbt_jaffle.customers.custom_test_Case/testCaseResult' \ --header 'Content-Type: application/json' \ --data-raw '{