17 KiB
title | slug |
---|---|
Tests | /connectors/ingestion/workflows/data-quality/tests |
Test
Here you can see all the supported tests definitions and how to configure them in the YAML config file.
A Test Definition is a generic definition of a test. This Test Definition then gets specified in a Test Case. This Test Case is where the parameter(s) of a Test Definition are specified.
In this section, you will learn what tests we currently support and how to configure them in the YAML/JSON config file.
Table Tests
Tests applied on top of a Table. Here is the list of all table tests:
- Table Row Count to Equal
- Table Row Count to be Between
- Table Column Count to Equal
- Table Column Count to be Between
- Table Column Name to Exist
- Table Column to Match Set
- Table Custom SQL Test
Table Row Count to Equal
Validate the total row count in the table is equal to the given value.
Properties:
value
: Expected number of rows.
YAML Config
testDefinitionName: tableRowCountToEqual
parameterValues:
- name: value
value: 2
JSON Config
{
"testDefinitionName": "tableRowCountToEqual",
"parameterValues": [
{
"name": "value",
"value": 2
}
]
}
Table Row Count to be Between
Validate the total row count is within a given range of values.
Properties:
minValue
: Lower bound of the interval. If informed, the number of rows should be bigger than this number.maxValue
: Upper bound of the interval. If informed, the number of rows should be lower than this number.
Any of those two need to be informed.
YAML Config
testDefinitionName: tableRowCountToBeBetween
parameterValues:
- name: minValue
value: 10
- name: maxValue
value: 10
JSON Config
{
"testDefinitionName": "tableRowCountToBeBetween",
"parameterValues": [
{
"name": "minValue",
"value": 10
},
{
"name": "maxValue",
"value": 10
}
]
}
Table Column Count to Equal
Validate that the number of columns in a table is equal to a given value.
Properties
columnCount
: Expected number of columns.
YAML Config
testDefinitionName: tableColumnCountToEqual
parameterValues:
- name: columnCount
value: 5
JSON Config
{
"testDefinitionName": "tableColumnCountToEqual",
"parameterValues": [
{
"name": "columnCount",
"value": 5
}
]
}
Table Column Count to be Between
Validate the number of columns in a table is between the given value
Properties
minColValue
: lower boundmaxColValue
: upper bound
YAML Config
testDefinitionName: tableColumnCountToBeBetween
parameterValues:
- name: minColValue
value: 5
- name: maxColValue
value: 10
JSON Config
{
"testDefinitionName": "tableColumnCountToBeBetween",
"parameterValues": [
{
"name": "minColValue",
"value": 5
},
{
"name": "maxColValue",
"value": 10
}
]
}
Table Column Name to Exist
Validate a column name is present in the table
Properties
columnName
: the name of the column to check for
YAML Config
testDefinitionName: tableColumnNameToExist
parameterValues:
- name: columnName
value: order_id
JSON Config
{
"testDefinitionName": "tableColumnNameToExist",
"parameterValues": [
{
"name": "columnName",
"value": "order_id"
}
]
}
Table Column to Match Set
Validate a list of table column name matches an expected set of columns
Properties
columnNames
: comma separated string of column nameordered
: whether the test should check for column ordering. Default to False
YAML Config
testDefinitionName: tableColumnToMatchSet
parameterValues:
- name: columnNames
value: "col1, col2, col3"
- name: ordered
value: true
JSON Config
{
"testDefinitionName": "tableColumnToMatchSet",
"parameterValues": [
{
"name": "columnNames",
"value": "col1, col2, col3"
},
{
"name": "ordered",
"value": true
}
]
}
Table Custom SQL Test
Write you own SQL test. The test will pass if the following condition is met:
- The query result return 0 row
Properties
sqlExpression
: SQL expression
Example
SELECT
customer_id
FROM DUAL
WHERE lifetime_value < 0;
SELECT
customer_id
FROM DUAL d
INNER JOIN OTHER o ON d.id = o.id
WHERE lifetime_value < 0;
YAML Config
testDefinitionName: tableCustomSQLQuery
parameterValues:
- name: sqlExpression
value: >
SELECT
customer_tier
FROM DUAL
WHERE customer_tier = 'GOLD' and lifetime_value < 10000;
JSON Config
{
"testDefinitionName": "tableCustomSQLQuery",
"parameterValues": [
{
"name": "sqlExpression",
"value": "SELECT customer_tier FROM DUAL WHERE customer_tier = 'GOLD' and lifetime_value < 10000;"
}
]
}
Column Tests
Tests applied on top of Column metrics. Here is the list of all column tests:
- Column Values to Be Unique
- Column Values to Be Not Null
- Column Values to Match Regex
- Column Values to not Match Regex
- Column Values to Be in Set
- Column Values to Be Not In Set
- Column Values to Be Between
- Column Values Missing Count to Be Equal
- Column Values Lengths to Be Between
- Column Value Max to Be Between
- Column Value Min to Be Between
- Column Value Mean to Be Between
- Column Value Median to Be Between
- Column Values Sum to Be Between
- Column Values Standard Deviation to Be Between
Column Values to Be Unique
Makes sure that there are no duplicate values in a given column.
Properties
columnValuesToBeUnique
: To be set astrue
. This is required for proper JSON parsing in the profiler module.
YAML Config
testDefinitionName: columnValuesToBeUnique
parameterValues:
- name: columnNames
value: true
JSON Config
{
"testDefinitionName": "columnValuesToBeUnique",
"parameterValues": [
{
"name": "columnNames",
"value": true
}
]
}
Column Values to Be Not Null
Validates that there are no null values in the column.
Properties
columnValuesToBeNotNull
: To be set astrue
. This is required for proper JSON parsing in the profiler module.
YAML Config
testDefinitionName: columnValuesToBeNotNull
parameterValues:
- name: columnValuesToBeNotNull
value: true
JSON Config
{
"testDefinitionName": "columnValuesToBeNotNull",
"parameterValues": [
{
"name": "columnValuesToBeNotNull",
"value": true
}
]
}
Column Values to Match Regex
This test allows us to specify how many values in a column we expect that will match a certain SQL LIKE
expression.
Properties
regex
: SQLLIKE
expression to match. E.g.,%something%
.
YAML Config
testDefinitionName: columnValuesToMatchRegex
parameterValues:
- name: regex
value: "%something%"
JSON Config
{
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "regex",
"value": "%something%"
}
]
}
Column Values to not Match Regex
This test allows us to specify values in a column we expect that will not match a certain SQL LIKE
expression. If the test find values matching the forbiddenRegex
the test will fail.
Properties
forbiddenRegex
: SQL LIKE expression to match. E.g.,%something%
.
YAML Config
testDefinitionName: columnValuesToMatchRegex
parameterValues:
- name: forbiddenRegex
value: "%something%"
JSON Config
{
"testDefinitionName": "columnValuesToMatchRegex",
"parameterValues": [
{
"name": "forbiddenRegex",
"value": "%something%"
}
]
}
Column Values to Be in Set
Validate values form a set are present in a column.
Properties
allowedValues
: List of allowed strings or numbers.
YAML Config
testDefinitionName: columnValuesToBeInSet
parameterValues:
- name: allowedValues
value: ["forbidden1", "forbidden2"]
JSON Config
{
"testDefinitionName": "columnValuesToBeInSet",
"parameterValues": [
{
"name": "allowedValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
Column Values to Be Not In Set
Validate that there are no values in a column in a set of forbidden values.
Properties
forbiddenValues
: List of forbidden strings or numbers.
YAML Config
testDefinitionName: columnValuesToBeNotInSet
parameterValues:
- name: forbiddenValues
value: ["forbidden1", "forbidden2"]
JSON Config
{
"testDefinitionName": "columnValuesToBeNotInSet",
"parameterValues": [
{
"name": "forbiddenValues",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
Column Values to Be Between
Validate that the values of a column are within a given range.
Only supports numerical types.
Properties
minValue
: Lower bound of the interval. If informed, the column values should be bigger than this number.maxValue
: Upper bound of the interval. If informed, the column values should be lower than this number.
Any of those two need to be informed.
YAML Config
testDefinitionName: columnValuesToBeBetween
parameterValues:
- name: minValue
value: ["forbidden1", "forbidden2"]
JSON Config
{
"testDefinitionName": "columnValuesToBeBetween",
"parameterValues": [
{
"name": "minValue",
"value": [
"forbidden1",
"forbidden2"
]
}
]
}
Column Values Missing Count to Be Equal
Validates that the number of missing values matches a given number. Missing values are the sum of nulls, plus the sum of values in a given list which we need to consider as missing data. A clear example of that would be NA
or N/A
.
Properties
missingCountValue
: The number of missing values needs to be equal to this. This field is mandatory.missingValueMatch
: A list of strings to consider as missing values. Optional.
YAML Config
testDefinitionName: columnValuesMissingCountToBeEqual
parameterValues:
- name: missingValueMatch
value: ["NA", "N/A"]
- name: missingCountValue
value: 100
JSON Config
{
"testDefinitionName": "columnValuesMissingCountToBeEqual",
"parameterValues": [
{
"name": "missingValueMatch",
"value": [
"NA",
"N/A"
]
},
{
"name": "missingCountValue",
"value": 100
}
]
}
JSON Config
{
"testDefinitionName": "columnValuesMissingCountToBeEqual",
"parameterValues": [
{
"name": "missingValueMatch",
"value": [
"NA",
"N/A"
]
},
{
"name": "missingCountValue",
"value": 100
}
]
}
Column Values Lengths to Be Between
Validates that the lengths of the strings in a column are within a given range.
Only supports concatenable types.
Properties
minLength
: Lower bound of the interval. If informed, the string length should be bigger than this number.maxLength
: Upper bound of the interval. If informed, the string length should be lower than this number.
Any of those two need to be informed.
YAML Config
testDefinitionName: columnValueLengthsToBeBetween
parameterValues:
- name: minLength
value: 50
- name: maxLength
value: 100
JSON Config
{
"testDefinitionName": "columnValueLengthsToBeBetween",
"parameterValues": [
{
"name": "minLength",
"value": 50
},
{
"name": "maxLength",
"value": 100
}
]
}
Column Value Max to Be Between
Validate the maximum value of a column is between a specific range
Only supports numerical types.
Properties
minValueForMaxInCol
: lower boundmaxValueForMaxInCol
: upper bound
YAML Config
testDefinitionName: columnValueMaxToBeBetween
parameterValues:
- name: minValueForMaxInCol
value: 50
- name: maxValueForMaxInCol
value: 100
JSON Config
{
"testDefinitionName": "columnValueMaxToBeBetween",
"parameterValues": [
{
"name": "minValueForMaxInCol",
"value": 50
},
{
"name": "maxValueForMaxInCol",
"value": 100
}
]
}
Column Value Min to Be Between
Validate the minimum value of a column is between a specific range
Only supports numerical types.
Properties
minValueForMinInCol
: lower boundmaxValueForMinInCol
: upper bound
YAML Config
testDefinitionName: columnValueMinToBeBetween
parameterValues:
- name: minValueForMinInCol
value: 10
- name: maxValueForMinInCol
value: 50
JSON Config
{
"testDefinitionName": "columnValueMinToBeBetween",
"parameterValues": [
{
"name": "minValueForMinInCol",
"value": 10
},
{
"name": "maxValueForMinInCol",
"value": 50
}
]
}
Column Value Mean to Be Between
Validate the mean of a column is between a specific range
Only supports numerical types.
Properties
minValueForMeanInCol
: lower boundmaxValueForMeanInCol
: upper bound
YAML Config
testDefinitionName: columnValueMeanToBeBetween
parameterValues:
- name: minValueForMeanInCol
value: 5
- name: maxValueForMeanInCol
value: 10
JSON Config
{
"testDefinitionName": "columnValueMeanToBeBetween",
"parameterValues": [
{
"name": "minValueForMeanInCol",
"value": 5
},
{
"name": "maxValueForMeanInCol",
"value": 10
}
]
}
Column Value Median to Be Between
Validate the median of a column is between a specific range
Only supports numerical types.
Properties
minValueForMedianInCol
: lower boundmaxValueForMedianInCol
: upper bound
YAML Config
testDefinitionName: columnValueMedianToBeBetween
parameterValues:
- name: minValueForMedianInCol
value: 5
- name: maxValueForMedianInCol
value: 10
JSON Config
{
"testDefinitionName": "columnValueMedianToBeBetween",
"parameterValues": [
{
"name": "minValueForMedianInCol",
"value": 5
},
{
"name": "maxValueForMedianInCol",
"value": 10
}
]
}
Column Values Sum to Be Between
Validate the sum of a column is between a specific range
Only supports numerical types.
Properties
minValueForColSum
: lower boundmaxValueForColSum
: upper bound
YAML Config
testDefinitionName: columnValueMedianToBeBetween
parameterValues:
- name: minValueForMedianInCol
value: 5
- name: maxValueForMedianInCol
value: 10
JSON Config
{
"testDefinitionName": "columnValueMedianToBeBetween",
"parameterValues": [
{
"name": "minValueForMedianInCol",
"value": 5
},
{
"name": "maxValueForMedianInCol",
"value": 10
}
]
}
Column Values Standard Deviation to Be Between
Validate the standard deviation of a column is between a specific range
Only supports numerical types.
Properties
minValueForStdDevInCol
: lower boundminValueForStdDevInCol
: upper bound
YAML Config
testDefinitionName: columnValueStdDevToBeBetween
parameterValues:
- name: minValueForStdDevInCol
value: 5
- name: maxValueForStdDevInCol
value: 10
JSON Config
{
"testDefinitionName": "columnValueStdDevToBeBetween",
"parameterValues": [
{
"name": "minValueForStdDevInCol",
"value": 5
},
{
"name": "maxValueForStdDevInCol",
"value": 10
}
]
}