Here you can see all the supported tests definitions and how to configure them in the YAML config file.
A **Test Definition** is a generic definition of a test. This Test Definition then gets specified in a Test Case. This Test Case is where the parameter(s) of a Test Definition are specified.
In this section, you will learn what tests we currently support and how to configure them in the YAML/JSON config file.
- [Table Tests](#table-tests)
- [Column Tests](#column-tests)
## Table Tests
Tests applied on top of a Table. Here is the list of all table tests:
- [Table Row Count to Equal](#table-row-count-to-equal)
- [Table Row Count to be Between](#table-row-count-to-be-between)
- [Table Column Count to Equal](#table-column-count-to-equal)
- [Table Column Count to be Between](#table-column-count-to-be-between)
- [Table Column Name to Exist](#table-column-name-to-exist)
- [Table Column to Match Set](#table-column-to-match-set)
- [Table Custom SQL Test](#table-custom-sql-test)
- [Table Row Inserted Count To Be Between](#table-row-inserted-count-to-be-between)
- [Compare 2 tables for differences](#compare-2-tables-for-differences)
### Table Row Count to Equal
Validate the total row count in the table is equal to the given value.
*`columnName`: the name of the column to check for
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|`columnName`**exists** in the set of column name for the table| Success ✅|
|`columnName`**does not exists** in the set of column name for the table|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
testDefinitionName: tableColumnNameToExist
parameterValues:
- name: columnName
value: order_id
```
**JSON Config**
```json
{
"myTestName": "myTestName",
"testDefinitionName": "tableColumnNameToExist",
"parameterValues": [
{
"name": "columnName",
"value": "order_id"
}
]
}
```
### Table Column to Match Set
Validate a list of table column name matches an expected set of columns
**Properties**
*`columnNames`: comma separated string of column name
*`ordered`: whether the test should check for column ordering. Default to False
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|[`ordered=False`] `columnNames`**matches** the list of column names in the table **regardless of the order**|Success ✅|
|[`ordered=True`] `columnNames`**matches** the list of column names in the table **in the corresponding order** (e.g. `["a","b"] == ["a","b"]`| Success ✅|
|[`ordered=FALSE`] `columnNames`**does no match** the list of column names in the table **regardless of the order**|Failed ❌|
|[`ordered=True`] `columnNames`**does no match** the list of column names in the table **and/or the corresponding order** (e.g. `["a","b"] != ["b","a"]`|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
testDefinitionName: tableColumnToMatchSet
parameterValues:
- name: columnNames
value: "col1, col2, col3"
- name: ordered
value: true
```
**JSON Config**
```json
{
"myTestName": "myTestName",
"testDefinitionName": "tableColumnToMatchSet",
"parameterValues": [
{
"name": "columnNames",
"value": "col1, col2, col3"
},
{
"name": "ordered",
"value": true
}
]
}
```
### Table Custom SQL Test
Write you own SQL test. When writting your query you can use 2 strategies:
-`ROWS` (default): expects the query to be written as `SELECT <field>, <field> FROM <foo> WHERE <condition>`. **Note** if your query returns a large amount of rows it might cause an "Out Of Memeory" error. In this case we recomend you to use the `COUNT` strategy.
-`COUNT`: expects the query to be written as `SELECT COUNT(<field>) FROM <foo> WHERE <condition>`.
**How to use the Threshold Parameter?**
The threshold allows you to define a limit for which you test should pass or fail - by defaut this number is 0. For example if my custom SQL query test returns 10 rows (or a COUNT value of 10) and my threshold is 5 the test will fail. If I update my threshold to 11 on my next run my test will pass.
**Properties**
*`sqlExpression`: SQL expression
*`strategy`: one of `ROWS` or `COUNT`
*`threshold`: an integer defining the threshold above which the test should fail (default to 0 if not specified)
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|`sqlExpression` returns **row <= threshold (default to 0)**|Success ✅|
|`sqlExpression` returns **row > threshold (default to 0)**|Failed ❌|
**Example**
```sql
SELECT
customer_id
FROM DUAL
WHERE lifetime_value <0;
```
```sql
SELECT
customer_id
FROM DUAL d
INNER JOIN OTHER o ON d.id = o.id
WHERE lifetime_value <0;
```
**YAML Config**
```yaml
- name: myTestName
description: test description
testDefinitionName: tableCustomSQLQuery
parameterValues:
- name: sqlExpression
value: >
SELECT
customer_tier
FROM DUAL
WHERE customer_tier = 'GOLD' and lifetime_value <10000;
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"testDefinitionName": "tableCustomSQLQuery",
"parameterValues": [
{
"name": "sqlExpression",
"value": "SELECT customer_tier FROM DUAL WHERE customer_tier = 'GOLD' and lifetime_value <10000;"
}
]
}
```
### Table Row Inserted Count To Be Between
Validate the number of rows inserted for the defined period is between the expected range
{% note %}
The Table Row Inserted Count To Be Between cannot be executed against tables that have configured a partition in OpenMetadata. The logic of the test performed will be similar to executing a Table Row Count to be Between test against a table with a partition configured.
{% /note %}
**Properties**
*`Min Row Count`: Lower bound
*`Max Row Count`: Upper bound
*`Column Name`: The name of the column used to apply the range filter
*`Range Type`: One of `HOUR`, `DAY`, `MONTH`, `YEAR`
*`Interval`: The range interval (e.g. 1,2,3,4,5, etc)
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|Number of rows **is between**`Min Row Count` and `Max Row Count`| Success ✅|
|Number of rows **is not between**`Min Row Count` and `Max Row Count|Failed ❌|
Compare 2 tables and report any column or row differences. The test will fail for any column differences or row differences that
are over the threshold.
The comparison will be scoped by:
1. The partition configuration (as in other test cases).
2. The "use columns" properties.
3. The "where" can be used to filter the rows to compare.
**Properties**
*`Table 2`: Fully qualified name of the table to compare against.
*`Key Columns`: The columns to use as the key for the comparison. If not provided, it will be resolved from the primary key or unique columns. The tuples created from the key columns must be unique.
*`Threshold`: Threshold to use to determine if the test passes or fails (defaults to 0).
*`Use Columns`:Limits the scope of the test to this list of columns. If not provided, all columns will be used except the key columns.
*`Where`: Use this where clause to filter the rows to compare.
| Number of row differences is **greater** than the threshold | Failed ❌ |
| Column diff is **equal** to 0 and number of row differences is **less than or equal to** the threshold | Success ✅ |
**YAML Config**
```yaml
- name: myTableDiff
description: compare table a and b
testDefinitionName: tableDiff
parameterValues:
- name: table2
value: dbservice.database.schema.my_table
# optional parameters:
- name: threshold
value: "10"
- name: keyColumns
value: '["id"]'
- name: useColumns
value: '["first_name","last_name"]'
- name: where
value: "country = 'US'"
```
**JSON Config**
```json
[
{
"name": "myTableDiff",
"description": "compare table a and b",
"testDefinitionName": "tableDiff",
"parameterValues": [
{
"name": "table2",
"value": "dbservice.database.schema.my_table"
},
{
"name": "threshold",
"value": "10"
},
{
"name": "keyColumns",
"value": "[\"id\"]"
},
{
"name": "useColumns",
"value": "[\"first_name\",\"last_name\"]"
},
{
"name": "where",
"value": "country = 'US'"
}
]
}
]
```
## Column Tests
Tests applied on top of Column metrics. Here is the list of all column tests:
- [Column Values to Be Unique](#column-values-to-be-unique)
- [Column Values to Be Not Null](#column-values-to-be-not-null)
- [Column Values to Match Regex](#column-values-to-match-regex)
- [Column Values to not Match Regex](#column-values-to-not-match-regex)
- [Column Values to Be in Set](#column-values-to-be-in-set)
- [Column Values to Be Not In Set](#column-values-to-be-not-in-set)
- [Column Values to Be Between](#column-values-to-be-between)
- [Column Values Missing Count to Be Equal](#column-values-missing-count-to-be-equal)
- [Column Values Lengths to Be Between](#column-values-lengths-to-be-between)
- [Column Value Max to Be Between](#column-value-max-to-be-between)
- [Column Value Min to Be Between](#column-value-min-to-be-between)
- [Column Value Mean to Be Between](#column-value-mean-to-be-between)
- [Column Value Median to Be Between](#column-value-median-to-be-between)
- [Column Values Sum to Be Between](#column-values-sum-to-be-between)
- [Column Values Standard Deviation to Be Between](#column-values-standard-deviation-to-be-between)
### Column Values to Be Unique
Makes sure that there are no duplicate values in a given column.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|column values are unique|Success ✅|
|column values are not unique|Failed ❌|
**Properties**
*`columnValuesToBeUnique`: To be set as `true`. This is required for proper JSON parsing in the profiler module.
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeUnique
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: columnNames
value: true
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeUnique",
"parameterValues": [
{
"name": "columnNames",
"value": true
}
]
}
```
### Column Values to Be Not Null
Validates that there are no null values in the column.
**Properties**
*`columnValuesToBeNotNull`: To be set as `true`. This is required for proper JSON parsing in the profiler module.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|No `NULL` values are present in the column|Success ✅|
|1 or more `NULL` values are present in the column|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeNotNull
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: columnValuesToBeNotNull
value: true
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeNotNull",
"parameterValues": [
{
"name": "columnValuesToBeNotNull",
"value": true
}
]
}
```
### Column Values to Match Regex
This test allows us to specify how many values in a column we expect that will match a certain regex expression. Please note that for certain databases we will fall back to SQL `LIKE` expression. The databases supporting regex pattern as of 0.13.2 are:
- redshift
- postgres
- oracle
- mysql
- mariaDB
- sqlite
- clickhouse
- snowflake
The other databases will fall back to the `LIKE` expression
**Properties**
*`regex`: expression to match a regex pattern. E.g., `[a-zA-Z0-9]{5}`.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|All column values match `regex`|Success ✅|
|1 or more column values do not match `regex`|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToMatchRegex
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: regex
value: "%something%"
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "regex",
"value": "%something%"
}
]
}
```
### Column Values to not Match Regex
This test allows us to specify values in a column we expect that will not match a certain regex expression. If the test find values matching the `forbiddenRegex` the test will fail. Please note that for certain databases we will fall back to SQL `LIKE` expression. The databases supporting regex pattern as of 0.13.2 are:
- redshift
- postgres
- oracle
- mysql
- mariaDB
- sqlite
- clickhouse
- snowflake
The other databases will fall back to the `LIKE` expression
**Properties**
*`regex`: expression to match a regex pattern. E.g., `[a-zA-Z0-9]{5}`.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|0 column value match `regex`|Success ✅|
|1 or more column values match `regex`|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToMatchRegex
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: forbiddenRegex
value: "%something%"
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "forbiddenRegex",
"value": "%something%"
}
]
}
```
### Column Values to Be in Set
Validate values form a set are present in a column.
**Properties**
*`allowedValues`: List of allowed strings or numbers.
| `matchEnum` is `false` and 1 or more values from `allowedValues` is found in the column | Success ✅ |
| `matchEnum` is `true` and all columns have a value from `allowedValues` | Success ✅ |
| `matchEnum` is `false` 0 value from `allowedValues` is found in the column | Failed ❌ |
| `matchEnum` is `true` and 1 or more columns does not have a vluae from `allowedValues` | Failed ❌ |
**YAML Config**
```yaml
- name: myTestName
testDefinitionName: columnValuesToBeInSet
columnName: columnName
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: allowedValues
value: '["forbidden1", "forbidden2"]'
- name: matchEnum
value: "" # or true
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeInSet",
"parameterValues": [
{
"name": "allowedValues",
"value": [
"forbidden1",
"forbidden2"
]
},
{
"name": "matchEnum",
"value": ""
}
]
}
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeInSet",
"parameterValues": [
{
"name": "allowedValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
```
### Column Values to Be Not In Set
Validate that there are no values in a column in a set of forbidden values.
**Properties**
*`forbiddenValues`: List of forbidden strings or numbers.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|0 value from `forbiddenValues` is found in the column|Success ✅|
|1 or more values from `forbiddenValues` is found in the column|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeNotInSet
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: forbiddenValues
value: ["forbidden1", "forbidden2"]
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeNotInSet",
"parameterValues": [
{
"name": "forbiddenValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
```
### Column Values to Be Between
Validate that the values of a column are within a given range.
> Only supports numerical types.
**Properties**
*`minValue`: Lower bound of the interval. If informed, the column values should be bigger than this number.
*`maxValue`: Upper bound of the interval. If informed, the column values should be lower than this number.
Any of those two need to be informed.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|value is **between**`minValue` and `maxValue`|Success ✅|
|value is **greater** than `minValue` if only `minValue` is specified|Success ✅|
|value is **less** then `maxValue` if only `maxValue` is specified|Success ✅|
|value is **not between**`minValue` and `maxValue`|Failed ❌|
|value is **less** than `minValue` if only `minValue` is specified|Failed ❌|
|value is **greater** then `maxValue` if only `maxValue` is specified|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeBetween
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: minValue
value: ["forbidden1", "forbidden2"]
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeBetween",
"parameterValues": [
{
"name": "minValue",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
```
### Column Values Missing Count to Be Equal
Validates that the number of missing values matches a given number. Missing values are the sum of nulls, plus the sum of values in a given list which we need to consider as missing data. A clear example of that would be `NA` or `N/A`.
**Properties**
*`missingCountValue`: The number of missing values needs to be equal to this. This field is mandatory.
*`missingValueMatch` (Optional): A list of strings to consider as missing values.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|Number of missing value is **equal** to `missingCountValue`|Success ✅|
|Number of missing value is **not equal** to `missingCountValue`|Failed ❌|