Here you can see all the supported tests definitions and how to configure them in the YAML config file.
A **Test Definition** is a generic definition of a test. This Test Definition then gets specified in a Test Case. This Test Case is where the parameter(s) of a Test Definition are specified.
In this section, you will learn what tests we currently support and how to configure them in the YAML/JSON config file.
*`columnName`: the name of the column to check for
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|`columnName`**exists** in the set of column name for the table| Success ✅|
|`columnName`**does not exists** in the set of column name for the table|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
testDefinitionName: tableColumnNameToExist
parameterValues:
- name: columnName
value: order_id
```
**JSON Config**
```json
{
"myTestName": "myTestName",
"testDefinitionName": "tableColumnNameToExist",
"parameterValues": [
{
"name": "columnName",
"value": "order_id"
}
]
}
```
### Table Column to Match Set
Validate a list of table column name matches an expected set of columns
**Properties**
*`columnNames`: comma separated string of column name
*`ordered`: whether the test should check for column ordering. Default to False
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|[`ordered=False`] `columnNames`**matches** the list of column names in the table **regardless of the order**|Success ✅|
|[`ordered=True`] `columnNames`**matches** the list of column names in the table **in the corresponding order** (e.g. `["a","b"] == ["a","b"]`| Success ✅|
|[`ordered=FALSE`] `columnNames`**does no match** the list of column names in the table **regardless of the order**|Failed ❌|
|[`ordered=True`] `columnNames`**does no match** the list of column names in the table **and/or the corresponding order** (e.g. `["a","b"] != ["b","a"]`|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
testDefinitionName: tableColumnToMatchSet
parameterValues:
- name: columnNames
value: "col1, col2, col3"
- name: ordered
value: true
```
**JSON Config**
```json
{
"myTestName": "myTestName",
"testDefinitionName": "tableColumnToMatchSet",
"parameterValues": [
{
"name": "columnNames",
"value": "col1, col2, col3"
},
{
"name": "ordered",
"value": true
}
]
}
```
### Table Custom SQL Test
Write you own SQL test. When writting your query you can use 2 strategies:
-`ROWS` (default): expects the query to be written as `SELECT <field>, <field> FROM <foo> WHERE <condition>`. **Note** if your query returns a large amount of rows it might cause an "Out Of Memeory" error. In this case we recomend you to use the `COUNT` strategy.
-`COUNT`: expects the query to be written as `SELECT COUNT(<field>) FROM <foo> WHERE <condition>`.
**How to use the Threshold Parameter?**
The threshold allows you to define a limit for which you test should pass or fail - by defaut this number is 0. For example if my custom SQL query test returns 10 rows (or a COUNT value of 10) and my threshold is 5 the test will fail. If I update my threshold to 11 on my next run my test will pass.
**Properties**
*`sqlExpression`: SQL expression
*`strategy`: one of `ROWS` or `COUNT`
*`threshold`: an integer defining the threshold above which the test should fail (default to 0 if not specified)
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|`sqlExpression` returns **row <= threshold (default to 0)**|Success ✅|
|`sqlExpression` returns **row > threshold (default to 0)**|Failed ❌|
**Example**
```sql
SELECT
customer_id
FROM DUAL
WHERE lifetime_value <0;
```
```sql
SELECT
customer_id
FROM DUAL d
INNER JOIN OTHER o ON d.id = o.id
WHERE lifetime_value <0;
```
**YAML Config**
```yaml
- name: myTestName
description: test description
testDefinitionName: tableCustomSQLQuery
parameterValues:
- name: sqlExpression
value: >
SELECT
customer_tier
FROM DUAL
WHERE customer_tier = 'GOLD' and lifetime_value <10000;
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"testDefinitionName": "tableCustomSQLQuery",
"parameterValues": [
{
"name": "sqlExpression",
"value": "SELECT customer_tier FROM DUAL WHERE customer_tier = 'GOLD' and lifetime_value <10000;"
}
]
}
```
### Table Row Inserted Count To Be Between
Validate the number of rows inserted for the defined period is between the expected range
{% note %}
The Table Row Inserted Count To Be Between cannot be executed against tables that have configured a partition in OpenMetadata. The logic of the test performed will be similar to executing a Table Row Count to be Between test against a table with a partition configured.
{% /note %}
**Properties**
*`Min Row Count`: Lower bound
*`Max Row Count`: Upper bound
*`Column Name`: The name of the column used to apply the range filter
*`Range Type`: One of `HOUR`, `DAY`, `MONTH`, `YEAR`
*`Interval`: The range interval (e.g. 1,2,3,4,5, etc)
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|Number of rows **is between**`Min Row Count` and `Max Row Count`| Success ✅|
|Number of rows **is not between**`Min Row Count` and `Max Row Count|Failed ❌|
Tests applied on top of Column metrics. Here is the list of all column tests:
- [Column Values to Be Unique](#column-values-to-be-unique)
- [Column Values to Be Not Null](#column-values-to-be-not-null)
- [Column Values to Match Regex](#column-values-to-match-regex)
- [Column Values to not Match Regex](#column-values-to-not-match-regex)
- [Column Values to Be in Set](#column-values-to-be-in-set)
- [Column Values to Be Not In Set](#column-values-to-be-not-in-set)
- [Column Values to Be Between](#column-values-to-be-between)
- [Column Values Missing Count to Be Equal](#column-values-missing-count-to-be-equal)
- [Column Values Lengths to Be Between](#column-values-lengths-to-be-between)
- [Column Value Max to Be Between](#column-value-max-to-be-between)
- [Column Value Min to Be Between](#column-value-min-to-be-between)
- [Column Value Mean to Be Between](#column-value-mean-to-be-between)
- [Column Value Median to Be Between](#column-value-median-to-be-between)
- [Column Values Sum to Be Between](#column-values-sum-to-be-between)
- [Column Values Standard Deviation to Be Between](#column-values-standard-deviation-to-be-between)
### Column Values to Be Unique
Makes sure that there are no duplicate values in a given column.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|column values are unique|Success ✅|
|column values are not unique|Failed ❌|
**Properties**
*`columnValuesToBeUnique`: To be set as `true`. This is required for proper JSON parsing in the profiler module.
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeUnique
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: columnNames
value: true
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeUnique",
"parameterValues": [
{
"name": "columnNames",
"value": true
}
]
}
```
### Column Values to Be Not Null
Validates that there are no null values in the column.
**Properties**
*`columnValuesToBeNotNull`: To be set as `true`. This is required for proper JSON parsing in the profiler module.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|No `NULL` values are present in the column|Success ✅|
|1 or more `NULL` values are present in the column|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeNotNull
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: columnValuesToBeNotNull
value: true
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeNotNull",
"parameterValues": [
{
"name": "columnValuesToBeNotNull",
"value": true
}
]
}
```
### Column Values to Match Regex
This test allows us to specify how many values in a column we expect that will match a certain regex expression. Please note that for certain databases we will fall back to SQL `LIKE` expression. The databases supporting regex pattern as of 0.13.2 are:
- redshift
- postgres
- oracle
- mysql
- mariaDB
- sqlite
- clickhouse
- snowflake
The other databases will fall back to the `LIKE` expression
**Properties**
*`regex`: expression to match a regex pattern. E.g., `[a-zA-Z0-9]{5}`.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|All column values match `regex`|Success ✅|
|1 or more column values do not match `regex`|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToMatchRegex
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: regex
value: "%something%"
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "regex",
"value": "%something%"
}
]
}
```
### Column Values to not Match Regex
This test allows us to specify values in a column we expect that will not match a certain regex expression. If the test find values matching the `forbiddenRegex` the test will fail. Please note that for certain databases we will fall back to SQL `LIKE` expression. The databases supporting regex pattern as of 0.13.2 are:
- redshift
- postgres
- oracle
- mysql
- mariaDB
- sqlite
- clickhouse
- snowflake
The other databases will fall back to the `LIKE` expression
**Properties**
*`regex`: expression to match a regex pattern. E.g., `[a-zA-Z0-9]{5}`.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|0 column value match `regex`|Success ✅|
|1 or more column values match `regex`|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToMatchRegex
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: forbiddenRegex
value: "%something%"
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "forbiddenRegex",
"value": "%something%"
}
]
}
```
### Column Values to Be in Set
Validate values form a set are present in a column.
**Properties**
*`allowedValues`: List of allowed strings or numbers.
Validate that there are no values in a column in a set of forbidden values.
**Properties**
*`forbiddenValues`: List of forbidden strings or numbers.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|0 value from `forbiddenValues` is found in the column|Success ✅|
|1 or more values from `forbiddenValues` is found in the column|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeNotInSet
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: forbiddenValues
value: ["forbidden1", "forbidden2"]
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeNotInSet",
"parameterValues": [
{
"name": "forbiddenValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
```
### Column Values to Be Between
Validate that the values of a column are within a given range.
> Only supports numerical types.
**Properties**
*`minValue`: Lower bound of the interval. If informed, the column values should be bigger than this number.
*`maxValue`: Upper bound of the interval. If informed, the column values should be lower than this number.
Any of those two need to be informed.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|value is **between**`minValue` and `maxValue`|Success ✅|
|value is **greater** than `minValue` if only `minValue` is specified|Success ✅|
|value is **less** then `maxValue` if only `maxValue` is specified|Success ✅|
|value is **not between**`minValue` and `maxValue`|Failed ❌|
|value is **less** than `minValue` if only `minValue` is specified|Failed ❌|
|value is **greater** then `maxValue` if only `maxValue` is specified|Failed ❌|
**YAML Config**
```yaml
- name: myTestName
description: test description
columnName: columnName
testDefinitionName: columnValuesToBeBetween
computePassedFailedRowCount: <trueorfalse>
parameterValues:
- name: minValue
value: ["forbidden1", "forbidden2"]
```
**JSON Config**
```json
{
"name": "myTestName",
"description": "test description",
"columnName": "columnName",
"testDefinitionName": "columnValuesToBeBetween",
"parameterValues": [
{
"name": "minValue",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
```
### Column Values Missing Count to Be Equal
Validates that the number of missing values matches a given number. Missing values are the sum of nulls, plus the sum of values in a given list which we need to consider as missing data. A clear example of that would be `NA` or `N/A`.
**Properties**
*`missingCountValue`: The number of missing values needs to be equal to this. This field is mandatory.
*`missingValueMatch` (Optional): A list of strings to consider as missing values.
**Behavior**
| Condition | Status |
| ----------- | ----------- |
|Number of missing value is **equal** to `missingCountValue`|Success ✅|
|Number of missing value is **not equal** to `missingCountValue`|Failed ❌|