2022-06-27 15:14:04 +02:00
---
title: Tests
2022-10-05 21:54:02 -07:00
slug: /connectors/ingestion/workflows/data-quality/tests
2022-06-27 15:14:04 +02:00
---
2022-09-16 07:04:56 +02:00
# Test
Here you can see all the supported tests definitions and how to configure them in the YAML config file.
2022-06-27 15:14:04 +02:00
2022-09-16 07:04:56 +02:00
A **Test Definition** is a generic definition of a test. This Test Definition then gets specified in a Test Case. This Test Case is where the parameter(s) of a Test Definition are specified.
2022-06-27 15:14:04 +02:00
2022-09-16 07:04:56 +02:00
In this section, you will learn what tests we currently support and how to configure them in the YAML/JSON config file.
2022-06-27 15:14:04 +02:00
## Table Tests
2022-09-16 07:04:56 +02:00
Tests applied on top of a Table. Here is the list of all table tests:
- [Table Row Count to Equal ](#table-row-count-to-equal )
- [Table Row Count to be Between ](#table-row-count-to-be-between )
- [Table Column Count to Equal ](#table-column-count-to-equal )
- [Table Column Count to be Between ](#table-column-count-to-be-between )
- [Table Column Name to Exist ](#table-column-name-to-exist )
- [Table Column to Match Set ](#table-column-to-match-set )
- [Table Custom SQL Test ](#table-custom-sql-test )
2022-06-27 15:14:04 +02:00
### Table Row Count to Equal
2022-09-16 07:04:56 +02:00
Validate the total row count in the table is equal to the given value.
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
**Properties**:
2022-06-27 15:14:04 +02:00
* `value` : Expected number of rows.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: tableRowCountToEqual
parameterValues:
- name: value
value: 2
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "tableRowCountToEqual",
"parameterValues": [
{
"name": "value",
"value": 2
}
]
2022-06-27 15:14:04 +02:00
}
```
### Table Row Count to be Between
2022-09-16 07:04:56 +02:00
Validate the total row count is within a given range of values.
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
**Properties**:
2022-06-27 15:14:04 +02:00
* `minValue` : Lower bound of the interval. If informed, the number of rows should be bigger than this number.
* `maxValue` : Upper bound of the interval. If informed, the number of rows should be lower than this number.
Any of those two need to be informed.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: tableRowCountToBeBetween
parameterValues:
- name: minValue
value: 10
- name: maxValue
value: 10
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "tableRowCountToBeBetween",
"parameterValues": [
{
"name": "minValue",
"value": 10
},
{
"name": "maxValue",
"value": 10
}
]
2022-06-27 15:14:04 +02:00
}
```
### Table Column Count to Equal
Validate that the number of columns in a table is equal to a given value.
2022-07-13 20:50:16 +02:00
**Properties**
2022-06-27 15:14:04 +02:00
* `columnCount` : Expected number of columns.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: tableColumnCountToEqual
parameterValues:
- name: columnCount
value: 5
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "tableColumnCountToEqual",
"parameterValues": [
{
"name": "columnCount",
"value": 5
}
]
2022-06-27 15:14:04 +02:00
}
```
2022-07-13 20:50:16 +02:00
### Table Column Count to be Between
2022-09-16 07:04:56 +02:00
Validate the number of columns in a table is between the given value
2022-07-13 20:50:16 +02:00
**Properties**
* `minColValue` : lower bound
* `maxColValue` : upper bound
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: tableColumnCountToBeBetween
parameterValues:
- name: minColValue
value: 5
- name: maxColValue
value: 10
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "tableColumnCountToBeBetween",
"parameterValues": [
{
"name": "minColValue",
"value": 5
},
{
"name": "maxColValue",
"value": 10
}
]
2022-07-13 20:50:16 +02:00
}
```
### Table Column Name to Exist
Validate a column name is present in the table
**Properties**
* `columnName` : the name of the column to check for
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: tableColumnNameToExist
parameterValues:
- name: columnName
value: order_id
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "tableColumnNameToExist",
"parameterValues": [
{
"name": "columnName",
"value": "order_id"
}
]
2022-07-13 20:50:16 +02:00
}
```
### Table Column to Match Set
2022-09-16 07:04:56 +02:00
Validate a list of table column name matches an expected set of columns
2022-07-13 20:50:16 +02:00
**Properties**
* `columnNames` : comma separated string of column name
* `ordered` : whether the test should check for column ordering. Default to False
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: tableColumnToMatchSet
parameterValues:
- name: columnNames
value: "col1, col2, col3"
- name: ordered
value: true
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "tableColumnToMatchSet",
"parameterValues": [
{
"name": "columnNames",
"value": "col1, col2, col3"
},
{
"name": "ordered",
"value": true
}
]
}
```
### Table Custom SQL Test
2022-11-16 18:10:55 +01:00
Write you own SQL test. The test will pass if the following condition is met:
2022-09-16 07:04:56 +02:00
- The query result return 0 row
**Properties**
* `sqlExpression` : SQL expression
**Example**
```sql
SELECT
2022-11-16 18:10:55 +01:00
customer_id
2022-09-16 07:04:56 +02:00
FROM DUAL
2022-11-16 18:10:55 +01:00
WHERE lifetime_value < 0 ;
2022-09-16 07:04:56 +02:00
```
```sql
SELECT
customer_id
2022-11-16 18:10:55 +01:00
FROM DUAL d
INNER JOIN OTHER o ON d.id = o.id
2022-09-16 07:04:56 +02:00
WHERE lifetime_value < 0 ;
```
**YAML Config**
```yaml
testDefinitionName: tableCustomSQLQuery
parameterValues:
- name: sqlExpression
value: >
SELECT
2022-11-16 18:10:55 +01:00
customer_tier
2022-09-16 07:04:56 +02:00
FROM DUAL
WHERE customer_tier = 'GOLD' and lifetime_value < 10000 ;
```
**JSON Config**
```json
{
"testDefinitionName": "tableCustomSQLQuery",
"parameterValues": [
{
"name": "sqlExpression",
2022-11-16 18:10:55 +01:00
"value": "SELECT customer_tier FROM DUAL WHERE customer_tier = 'GOLD' and lifetime_value < 10000 ; "
2022-09-16 07:04:56 +02:00
}
]
2022-07-13 20:50:16 +02:00
}
```
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
## Column Tests
2022-09-16 07:04:56 +02:00
Tests applied on top of Column metrics. Here is the list of all column tests:
- [Column Values to Be Unique ](#column-values-to-be-unique )
- [Column Values to Be Not Null ](#column-values-to-be-not-null )
- [Column Values to Match Regex ](#column-values-to-match-regex )
- [Column Values to not Match Regex ](#column-values-to-not-match-regex )
- [Column Values to Be in Set ](#column-values-to-be-in-set )
- [Column Values to Be Not In Set ](#column-values-to-be-not-in-set )
- [Column Values to Be Between ](#column-values-to-be-between )
- [Column Values Missing Count to Be Equal ](#column-values-missing-count-to-be-equal )
- [Column Values Lengths to Be Between ](#column-values-lengths-to-be-between )
- [Column Value Max to Be Between ](#column-value-max-to-be-between )
- [Column Value Min to Be Between ](#column-value-min-to-be-between )
- [Column Value Mean to Be Between ](#column-value-mean-to-be-between )
- [Column Value Median to Be Between ](#column-value-median-to-be-between )
- [Column Values Sum to Be Between ](#column-values-sum-to-be-between )
- [Column Values Standard Deviation to Be Between ](#column-values-standard-deviation-to-be-between )
2022-06-27 15:14:04 +02:00
### Column Values to Be Unique
2022-09-16 07:04:56 +02:00
Makes sure that there are no duplicate values in a given column.
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
**Properties**
2022-06-27 15:14:04 +02:00
* `columnValuesToBeUnique` : To be set as `true` . This is required for proper JSON parsing in the profiler module.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValuesToBeUnique
parameterValues:
- name: columnNames
value: true
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValuesToBeUnique",
"parameterValues": [
{
"name": "columnNames",
"value": true
}
]
2022-06-27 15:14:04 +02:00
}
```
### Column Values to Be Not Null
Validates that there are no null values in the column.
2022-07-13 20:50:16 +02:00
**Properties**
2022-06-27 15:14:04 +02:00
* `columnValuesToBeNotNull` : To be set as `true` . This is required for proper JSON parsing in the profiler module.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValuesToBeNotNull
parameterValues:
- name: columnValuesToBeNotNull
value: true
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValuesToBeNotNull",
"parameterValues": [
{
"name": "columnValuesToBeNotNull",
"value": true
}
]
2022-06-27 15:14:04 +02:00
}
```
### Column Values to Match Regex
This test allows us to specify how many values in a column we expect that will match a certain SQL `LIKE` expression.
2022-07-13 20:50:16 +02:00
**Properties**
2022-06-27 15:14:04 +02:00
* `regex` : SQL `LIKE` expression to match. E.g., `%something%` .
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValuesToMatchRegex
parameterValues:
- name: regex
value: "%something%"
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "regex",
"value": "%something%"
}
]
2022-06-27 15:14:04 +02:00
}
```
2022-07-13 20:50:16 +02:00
### Column Values to not Match Regex
This test allows us to specify values in a column we expect that will not match a certain SQL `LIKE` expression. If the test find values matching the `forbiddenRegex` the test will fail.
**Properties**
* `forbiddenRegex` : SQL LIKE expression to match. E.g., `%something%` .
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValuesToMatchRegex
parameterValues:
- name: forbiddenRegex
value: "%something%"
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "forbiddenRegex",
"value": "%something%"
}
]
2022-07-13 20:50:16 +02:00
}
```
### Column Values to Be in Set
Validate values form a set are present in a column.
**Properties**
* `allowedValues` : List of allowed strings or numbers.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValuesToBeInSet
parameterValues:
- name: allowedValues
value: ["forbidden1", "forbidden2"]
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValuesToBeInSet",
"parameterValues": [
{
"name": "allowedValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
2022-07-13 20:50:16 +02:00
}
```
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
### Column Values to Be Not In Set
2022-06-27 15:14:04 +02:00
Validate that there are no values in a column in a set of forbidden values.
2022-07-13 20:50:16 +02:00
**Properties**
2022-06-27 15:14:04 +02:00
* `forbiddenValues` : List of forbidden strings or numbers.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValuesToBeNotInSet
parameterValues:
- name: forbiddenValues
value: ["forbidden1", "forbidden2"]
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValuesToBeNotInSet",
"parameterValues": [
{
"name": "forbiddenValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
2022-06-27 15:14:04 +02:00
}
```
### Column Values to Be Between
Validate that the values of a column are within a given range.
> Only supports numerical types.
2022-07-13 20:50:16 +02:00
**Properties**
2022-06-27 15:14:04 +02:00
* `minValue` : Lower bound of the interval. If informed, the column values should be bigger than this number.
* `maxValue` : Upper bound of the interval. If informed, the column values should be lower than this number.
Any of those two need to be informed.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValuesToBeBetween
parameterValues:
- name: minValue
value: ["forbidden1", "forbidden2"]
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValuesToBeBetween",
"parameterValues": [
{
"name": "minValue",
"value": [
"forbidden1",
"forbidden2"
]
}
]
2022-06-27 15:14:04 +02:00
}
```
### Column Values Missing Count to Be Equal
Validates that the number of missing values matches a given number. Missing values are the sum of nulls, plus the sum of values in a given list which we need to consider as missing data. A clear example of that would be `NA` or `N/A` .
2022-07-13 20:50:16 +02:00
**Properties**
2022-06-27 15:14:04 +02:00
* `missingCountValue` : The number of missing values needs to be equal to this. This field is mandatory.
* `missingValueMatch` : A list of strings to consider as missing values. Optional.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValuesMissingCountToBeEqual
parameterValues:
- name: missingValueMatch
value: ["NA", "N/A"]
- name: missingCountValue
value: 100
```
**JSON Config**
```json
{
"testDefinitionName": "columnValuesMissingCountToBeEqual",
"parameterValues": [
{
"name": "missingValueMatch",
"value": [
"NA",
"N/A"
]
},
{
"name": "missingCountValue",
"value": 100
}
]
}
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValuesMissingCountToBeEqual",
"parameterValues": [
{
"name": "missingValueMatch",
"value": [
"NA",
"N/A"
]
},
{
"name": "missingCountValue",
"value": 100
}
]
2022-06-27 15:14:04 +02:00
}
```
### Column Values Lengths to Be Between
Validates that the lengths of the strings in a column are within a given range.
> Only supports concatenable types.
2022-07-13 20:50:16 +02:00
**Properties**
2022-06-27 15:14:04 +02:00
* `minLength` : Lower bound of the interval. If informed, the string length should be bigger than this number.
* `maxLength` : Upper bound of the interval. If informed, the string length should be lower than this number.
Any of those two need to be informed.
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValueLengthsToBeBetween
parameterValues:
- name: minLength
value: 50
- name: maxLength
value: 100
```
2022-07-13 20:50:16 +02:00
**JSON Config**
2022-06-27 15:14:04 +02:00
2022-07-13 20:50:16 +02:00
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValueLengthsToBeBetween",
"parameterValues": [
{
"name": "minLength",
"value": 50
},
{
"name": "maxLength",
"value": 100
}
]
2022-06-27 15:14:04 +02:00
}
2022-07-13 20:50:16 +02:00
```
### Column Value Max to Be Between
Validate the maximum value of a column is between a specific range
> Only supports numerical types.
**Properties**
* `minValueForMaxInCol` : lower bound
* `maxValueForMaxInCol` : upper bound
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValueMaxToBeBetween
parameterValues:
- name: minValueForMaxInCol
value: 50
- name: maxValueForMaxInCol
value: 100
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValueMaxToBeBetween",
"parameterValues": [
{
"name": "minValueForMaxInCol",
"value": 50
},
{
"name": "maxValueForMaxInCol",
"value": 100
}
]
2022-07-13 20:50:16 +02:00
}
```
### Column Value Min to Be Between
Validate the minimum value of a column is between a specific range
> Only supports numerical types.
**Properties**
* `minValueForMinInCol` : lower bound
* `maxValueForMinInCol` : upper bound
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValueMinToBeBetween
parameterValues:
- name: minValueForMinInCol
value: 10
- name: maxValueForMinInCol
value: 50
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValueMinToBeBetween",
"parameterValues": [
{
"name": "minValueForMinInCol",
"value": 10
},
{
"name": "maxValueForMinInCol",
"value": 50
}
]
2022-07-13 20:50:16 +02:00
}
```
### Column Value Mean to Be Between
Validate the mean of a column is between a specific range
> Only supports numerical types.
**Properties**
* `minValueForMeanInCol` : lower bound
* `maxValueForMeanInCol` : upper bound
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValueMeanToBeBetween
parameterValues:
- name: minValueForMeanInCol
value: 5
- name: maxValueForMeanInCol
value: 10
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValueMeanToBeBetween",
"parameterValues": [
{
"name": "minValueForMeanInCol",
"value": 5
},
{
"name": "maxValueForMeanInCol",
"value": 10
}
]
2022-07-13 20:50:16 +02:00
}
```
### Column Value Median to Be Between
Validate the median of a column is between a specific range
> Only supports numerical types.
**Properties**
* `minValueForMedianInCol` : lower bound
* `maxValueForMedianInCol` : upper bound
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValueMedianToBeBetween
parameterValues:
- name: minValueForMedianInCol
value: 5
- name: maxValueForMedianInCol
value: 10
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValueMedianToBeBetween",
"parameterValues": [
{
"name": "minValueForMedianInCol",
"value": 5
},
{
"name": "maxValueForMedianInCol",
"value": 10
}
]
2022-07-13 20:50:16 +02:00
}
```
### Column Values Sum to Be Between
Validate the sum of a column is between a specific range
> Only supports numerical types.
**Properties**
* `minValueForColSum` : lower bound
* `maxValueForColSum` : upper bound
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValueMedianToBeBetween
parameterValues:
- name: minValueForMedianInCol
value: 5
- name: maxValueForMedianInCol
value: 10
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValueMedianToBeBetween",
"parameterValues": [
{
"name": "minValueForMedianInCol",
"value": 5
},
{
"name": "maxValueForMedianInCol",
"value": 10
}
]
2022-07-13 20:50:16 +02:00
}
```
### Column Values Standard Deviation to Be Between
Validate the standard deviation of a column is between a specific range
> Only supports numerical types.
**Properties**
* `minValueForStdDevInCol` : lower bound
* `minValueForStdDevInCol` : upper bound
2022-09-16 07:04:56 +02:00
**YAML Config**
```yaml
testDefinitionName: columnValueStdDevToBeBetween
parameterValues:
- name: minValueForStdDevInCol
value: 5
- name: maxValueForStdDevInCol
value: 10
```
2022-07-13 20:50:16 +02:00
**JSON Config**
```json
2022-09-16 07:04:56 +02:00
{
"testDefinitionName": "columnValueStdDevToBeBetween",
"parameterValues": [
{
"name": "minValueForStdDevInCol",
"value": 5
},
{
"name": "maxValueForStdDevInCol",
"value": 10
}
]
2022-07-13 20:50:16 +02:00
}
```