27 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	| title | slug | 
|---|---|
| Tests | /connectors/ingestion/workflows/data-quality/tests | 
Test
Here you can see all the supported tests definitions and how to configure them in the YAML config file.
A Test Definition is a generic definition of a test. This Test Definition then gets specified in a Test Case. This Test Case is where the parameter(s) of a Test Definition are specified.
In this section, you will learn what tests we currently support and how to configure them in the YAML/JSON config file.
Table Tests
Tests applied on top of a Table. Here is the list of all table tests:
- Table Row Count to Equal
- Table Row Count to be Between
- Table Column Count to Equal
- Table Column Count to be Between
- Table Column Name to Exist
- Table Column to Match Set
- Table Custom SQL Test
Table Row Count to Equal
Validate the total row count in the table is equal to the given value.
Properties:
- value: Expected number of rows.
Behavior
| Condition | Status | 
|---|---|
| valuematch the number of rows in the table | Success ✅ | 
| valuedoes not match the number of rows in the table | Failed ❌ | 
YAML Config
testDefinitionName: tableRowCountToEqual
parameterValues:
    - name: value
      value: 2
JSON Config
{
    "testDefinitionName": "tableRowCountToEqual",
    "parameterValues": [
        {
            "name": "value",
            "value": 2
        }
    ]
}
Table Row Count to be Between
Validate the total row count is within a given range of values.
Properties:
- minValue: Lower bound of the interval. If informed, the number of rows should be bigger than this number.
- maxValue: Upper bound of the interval. If informed, the number of rows should be lower than this number.
Any of those two need to be informed.
Behavior
| Condition | Status | 
|---|---|
| The number of rows in the table is between minValueandmaxValue | Success ✅ | 
| The number of rows in the table is not between minValueandmaxValue | Failed ❌ | 
YAML Config
testDefinitionName: tableRowCountToBeBetween
parameterValues:
    - name: minValue
      value: 10
    - name: maxValue
      value: 10
JSON Config
{
    "testDefinitionName": "tableRowCountToBeBetween",
    "parameterValues": [
        {
            "name": "minValue",
            "value": 10
        },
        {
            "name": "maxValue",
            "value": 10
        }
    ]
}
Table Column Count to Equal
Validate that the number of columns in a table is equal to a given value.
Properties
- columnCount: Expected number of columns.
Behavior
| Condition | Status | 
|---|---|
| columnCountmatches the number of column in the table | Success ✅ | 
| columnCountdoes not matches the number of column in the table | Failed ❌ | 
YAML Config
testDefinitionName: tableColumnCountToEqual
parameterValues:
    - name: columnCount
      value: 5
JSON Config
{
    "testDefinitionName": "tableColumnCountToEqual",
    "parameterValues": [
        {
            "name": "columnCount",
            "value": 5
        }
    ]
}
Table Column Count to be Between
Validate the number of columns in a table is between the given value
Properties
- minColValue: lower bound
- maxColValue: upper bound
Behavior
| Condition | Status | 
|---|---|
| The number of columns in the table is between minColValueandmaxColValue | Success ✅ | 
| The number of columns in the table is not between minColValueandmaxColValue | Failed ❌ | 
YAML Config
testDefinitionName: tableColumnCountToBeBetween
parameterValues:
    - name: minColValue
      value: 5
    - name: maxColValue
      value: 10
JSON Config
{
    "testDefinitionName": "tableColumnCountToBeBetween",
    "parameterValues": [
        {
            "name": "minColValue",
            "value": 5
        },
        {
            "name": "maxColValue",
            "value": 10
        }
    ]
}
Table Column Name to Exist
Validate a column name is present in the table
Properties
- columnName: the name of the column to check for
Behavior
| Condition | Status | 
|---|---|
| columnNameexists in the set of column name for the table | Success ✅ | 
| columnNamedoes not exists in the set of column name for the table | Failed ❌ | 
YAML Config
testDefinitionName: tableColumnNameToExist
parameterValues:
    - name: columnName
      value: order_id
JSON Config
{
    "testDefinitionName": "tableColumnNameToExist",
    "parameterValues": [
        {
            "name": "columnName",
            "value": "order_id"
        }
    ]
}
Table Column to Match Set
Validate a list of table column name matches an expected set of columns
Properties
- columnNames: comma separated string of column name
- ordered: whether the test should check for column ordering. Default to False
Behavior
| Condition | Status | 
|---|---|
| [ ordered=False]columnNamesmatches the list of column names in the table regarless of the order | Success ✅ | 
| [ ordered=True]columnNamesmatches the list of column names in the table in the corresponding order (e.g.["a","b"] == ["a","b"] | Success ✅ | 
| [ ordered=fALSE]columnNamesdoes no match the list of column names in the table regarless of the order | Failed ❌ | 
| [ ordered=True]columnNamesdoes no match the list of column names in the table and/or the corresponding order (e.g.["a","b"] != ["b","a"] | Failed ❌ | 
YAML Config
testDefinitionName: tableColumnToMatchSet
parameterValues:
    - name: columnNames
      value: "col1, col2, col3"
    - name: ordered
      value: true
JSON Config
{
    "testDefinitionName": "tableColumnToMatchSet",
    "parameterValues": [
        {
            "name": "columnNames",
            "value": "col1, col2, col3"
        },
        {
            "name": "ordered",
            "value": true
        }
    ]
}
Table Custom SQL Test
Write you own SQL test. The test will pass if the following condition is met:
- The query result return 0 row
Properties
- sqlExpression: SQL expression
Behavior
| Condition | Status | 
|---|---|
| sqlExpressionreturns 0 row | Success ✅ | 
| sqlExpressionreturns 1 or more rows | Failed ❌ | 
Example
SELECT 
customer_id
FROM DUAL 
WHERE lifetime_value < 0;
SELECT 
customer_id
FROM DUAL d
INNER JOIN OTHER o ON d.id = o.id
WHERE lifetime_value < 0;
YAML Config
testDefinitionName: tableCustomSQLQuery
parameterValues:
    - name: sqlExpression
      value: >
        SELECT 
        customer_tier
        FROM DUAL 
        WHERE customer_tier = 'GOLD' and lifetime_value < 10000;
JSON Config
{
    "testDefinitionName": "tableCustomSQLQuery",
    "parameterValues": [
        {
            "name": "sqlExpression",
            "value": "SELECT  customer_tier FROM DUAL  WHERE customer_tier = 'GOLD' and lifetime_value < 10000;"
        }
    ]
}
Column Tests
Tests applied on top of Column metrics. Here is the list of all column tests:
- Column Values to Be Unique
- Column Values to Be Not Null
- Column Values to Match Regex
- Column Values to not Match Regex
- Column Values to Be in Set
- Column Values to Be Not In Set
- Column Values to Be Between
- Column Values Missing Count to Be Equal
- Column Values Lengths to Be Between
- Column Value Max to Be Between
- Column Value Min to Be Between
- Column Value Mean to Be Between
- Column Value Median to Be Between
- Column Values Sum to Be Between
- Column Values Standard Deviation to Be Between
Column Values to Be Unique
Makes sure that there are no duplicate values in a given column.
Behavior
| Condition | Status | 
|---|---|
| column values are unique | Success ✅ | 
| column values are not unique | Failed ❌ | 
Properties
- columnValuesToBeUnique: To be set as- true. This is required for proper JSON parsing in the profiler module.
YAML Config
testDefinitionName: columnValuesToBeUnique
parameterValues:
    - name: columnNames
      value: true
JSON Config
{
    "testDefinitionName": "columnValuesToBeUnique",
    "parameterValues": [
        {
            "name": "columnNames",
            "value": true
        }
    ]
}
Column Values to Be Not Null
Validates that there are no null values in the column.
Properties
- columnValuesToBeNotNull: To be set as- true. This is required for proper JSON parsing in the profiler module.
Behavior
| Condition | Status | 
|---|---|
| No NULLvalues are present in the column | Success ✅ | 
| 1 or more NULLvalues are present in the column | Failed ❌ | 
YAML Config
testDefinitionName: columnValuesToBeNotNull
parameterValues:
    - name: columnValuesToBeNotNull
      value: true
JSON Config
{
    "testDefinitionName": "columnValuesToBeNotNull",
    "parameterValues": [
        {
            "name": "columnValuesToBeNotNull",
            "value": true
        }
    ]
}
Column Values to Match Regex
This test allows us to specify how many values in a column we expect that will match a certain regex expression. Please note that for certain databases we will fall back to SQL LIKE expression. The databases supporting regex pattern as of 0.13.2 are:
- redshift
- postgres
- oracle
- mysql
- mariaDB
- sqlite
- clickhouse
- snowfalke
The other databases will fall back to the LIKE expression
Properties
- regex: expression to match a regex pattern. E.g.,- [a-zA-Z0-9]{5}.
Behavior
| Condition | Status | 
|---|---|
| All column values match regex | Success ✅ | 
| 1 or more column values do not match regex | Failed ❌ | 
YAML Config
testDefinitionName: columnValuesToMatchRegex
parameterValues:
    - name: regex
      value: "%something%"
JSON Config
{
    "testDefinitionName": "columnValuesToMatchRegex",
    "parameterValues": [
        {
            "name": "regex",
            "value": "%something%"
        }
    ]
}
Column Values to not Match Regex
This test allows us to specify values in a column we expect that will not match a certain regex expression. If the test find values matching the forbiddenRegex the test will fail. Please note that for certain databases we will fall back to SQL LIKE expression. The databases supporting regex pattern as of 0.13.2 are:
- redshift
- postgres
- oracle
- mysql
- mariaDB
- sqlite
- clickhouse
- snowfalke
The other databases will fall back to the LIKE expression
Properties
- regex: expression to match a regex pattern. E.g.,- [a-zA-Z0-9]{5}.
Behavior
| Condition | Status | 
|---|---|
| 0 column value match regex | Success ✅ | 
| 1 or more column values match regex | Failed ❌ | 
YAML Config
testDefinitionName: columnValuesToMatchRegex
parameterValues:
    - name: forbiddenRegex
      value: "%something%"
JSON Config
{
    "testDefinitionName": "columnValuesToMatchRegex",
    "parameterValues": [
        {
            "name": "forbiddenRegex",
            "value": "%something%"
        }
    ]
}
Column Values to Be in Set
Validate values form a set are present in a column.
Properties
- allowedValues: List of allowed strings or numbers.
Behavior
| Condition | Status | 
|---|---|
| 1 or more values from allowedValuesis found in the column | Success ✅ | 
| 0 value from allowedValuesis found in the column | Failed ❌ | 
YAML Config
testDefinitionName: columnValuesToBeInSet
parameterValues:
    - name: allowedValues
      value: ["forbidden1", "forbidden2"]
JSON Config
{
    "testDefinitionName": "columnValuesToBeInSet",
    "parameterValues": [
        {
            "name": "allowedValues",
            "value": [
                "forbidden1",
                "forbidden2"
            ]
        }
    ]
}
Column Values to Be Not In Set
Validate that there are no values in a column in a set of forbidden values.
Properties
- forbiddenValues: List of forbidden strings or numbers.
Behavior
| Condition | Status | 
|---|---|
| 0 value from forbiddenValuesis found in the column | Success ✅ | 
| 1 or more values from forbiddenValuesis found in the column | Failed ❌ | 
YAML Config
testDefinitionName: columnValuesToBeNotInSet
parameterValues:
    - name: forbiddenValues
      value: ["forbidden1", "forbidden2"]
JSON Config
{
    "testDefinitionName": "columnValuesToBeNotInSet",
    "parameterValues": [
        {
            "name": "forbiddenValues",
            "value": [
                "forbidden1",
                "forbidden2"
            ]
        }
    ]
}
Column Values to Be Between
Validate that the values of a column are within a given range.
Only supports numerical types.
Properties
- minValue: Lower bound of the interval. If informed, the column values should be bigger than this number.
- maxValue: Upper bound of the interval. If informed, the column values should be lower than this number.
Any of those two need to be informed.
Behavior
| Condition | Status | 
|---|---|
| value is between minValueandmaxValue | Success ✅ | 
| value is greater than minValueif onlyminValueis specified | Success ✅ | 
| value is less then maxValueif onlymaxValueis specified | Success ✅ | 
| value is not between minValueandmaxValue | Failed ❌ | 
| value is less than minValueif onlyminValueis specified | Failed ❌ | 
| value is greater then maxValueif onlymaxValueis specified | Failed ❌ | 
YAML Config
testDefinitionName: columnValuesToBeBetween
parameterValues:
    - name: minValue
      value: ["forbidden1", "forbidden2"]
JSON Config
{
    "testDefinitionName": "columnValuesToBeBetween",
    "parameterValues": [
        {
            "name": "minValue",
            "value": [
                "forbidden1",
                "forbidden2"
            ]
        }
    ]
}
Column Values Missing Count to Be Equal
Validates that the number of missing values matches a given number. Missing values are the sum of nulls, plus the sum of values in a given list which we need to consider as missing data. A clear example of that would be NA or N/A.
Properties
- missingCountValue: The number of missing values needs to be equal to this. This field is mandatory.
- missingValueMatch(Optional): A list of strings to consider as missing values.
Behavior
| Condition | Status | 
|---|---|
| Number of missing value is equal to missingCountValue | Success ✅ | 
| Number of missing value is not equal to missingCountValue | Failed ❌ | 
YAML Config
testDefinitionName: columnValuesMissingCountToBeEqual
parameterValues:
    - name: missingValueMatch
      value: ["NA", "N/A"]
    - name: missingCountValue
      value: 100
JSON Config
{
    "testDefinitionName": "columnValuesMissingCountToBeEqual",
    "parameterValues": [
        {
            "name": "missingValueMatch",
            "value": [
                "NA",
                "N/A"
            ]
        },
        {
            "name": "missingCountValue",
            "value": 100
        }
    ]
}
JSON Config
{
    "testDefinitionName": "columnValuesMissingCountToBeEqual",
    "parameterValues": [
        {
            "name": "missingValueMatch",
            "value": [
                "NA",
                "N/A"
            ]
        },
        {
            "name": "missingCountValue",
            "value": 100
        }
    ]
}
Column Values Lengths to Be Between
Validates that the lengths of the strings in a column are within a given range.
Only supports concatenable types.
Properties
- minLength: Lower bound of the interval. If informed, the string length should be bigger than this number.
- maxLength: Upper bound of the interval. If informed, the string length should be lower than this number.
Any of those two need to be informed.
Behavior
| Condition | Status | 
|---|---|
| value length is between minLengthandmaxLength | Success ✅ | 
| value length is greater than minLengthif onlyminLengthis specified | Success ✅ | 
| value length is less then maxLengthif onlymaxLengthis specified | Success ✅ | 
| value length is not between minLengthandmaxLength | Failed ❌ | 
| value length is less than minLengthif onlyminLengthis specified | Failed ❌ | 
| value length is greater then maxLengthif onlymaxLengthis specified | Failed ❌ | 
YAML Config
testDefinitionName: columnValueLengthsToBeBetween
parameterValues:
    - name: minLength
      value: 50
    - name: maxLength
      value: 100
JSON Config
{
    "testDefinitionName": "columnValueLengthsToBeBetween",
    "parameterValues": [
        {
            "name": "minLength",
            "value": 50
        },
        {
            "name": "maxLength",
            "value": 100
        }
    ]
}
Column Value Max to Be Between
Validate the maximum value of a column is between a specific range
Only supports numerical types.
Properties
- minValueForMaxInCol: lower bound
- maxValueForMaxInCol: upper bound
Behavior
| Condition | Status | 
|---|---|
| column max value is between minValueForMaxInColandmaxValueForMaxInCol | Success ✅ | 
| column max value is greater than minValueForMaxInColif onlyminValueForMaxInColis specified | Success ✅ | 
| column max value is less then maxValueForMaxInColif onlymaxValueForMaxInColis specified | Success ✅ | 
| column max value is not between minValueForMaxInColandmaxValueForMaxInCol | Failed ❌ | 
| column max value is less than minValueForMaxInColif onlyminValueForMaxInColis specified | Failed ❌ | 
| column max value is greater then maxValueForMaxInColif onlymaxValueForMaxInColis specified | Failed ❌ | 
YAML Config
testDefinitionName: columnValueMaxToBeBetween
parameterValues:
    - name: minValueForMaxInCol
      value: 50
    - name: maxValueForMaxInCol
      value: 100
JSON Config
{
    "testDefinitionName": "columnValueMaxToBeBetween",
    "parameterValues": [
        {
            "name": "minValueForMaxInCol",
            "value": 50
        },
        {
            "name": "maxValueForMaxInCol",
            "value": 100
        }
    ]
}
Column Value Min to Be Between
Validate the minimum value of a column is between a specific range
Only supports numerical types.
Properties
- minValueForMinInCol: lower bound
- maxValueForMinInCol: upper bound
Behavior
| Condition | Status | 
|---|---|
| column min value is between minValueForMinInColandmaxValueForMinInCol | Success ✅ | 
| column min value is greater than minValueForMinInColif onlyminValueForMinInColis specified | Success ✅ | 
| column min value is less then maxValueForMinInColif onlymaxValueForMinInColis specified | Success ✅ | 
| column min value is not between minValueForMinInColandmaxValueForMinInCol | Failed ❌ | 
| column min value is less than minValueForMinInColif onlyminValueForMinInColis specified | Failed ❌ | 
| column min value is greater then maxValueForMinInColif onlymaxValueForMinInColis specified | Failed ❌ | 
YAML Config
testDefinitionName: columnValueMinToBeBetween
parameterValues:
    - name: minValueForMinInCol
      value: 10
    - name: maxValueForMinInCol
      value: 50
JSON Config
{
    "testDefinitionName": "columnValueMinToBeBetween",
    "parameterValues": [
        {
            "name": "minValueForMinInCol",
            "value": 10
        },
        {
            "name": "maxValueForMinInCol",
            "value": 50
        }
    ]
}
Column Value Mean to Be Between
Validate the mean of a column is between a specific range
Only supports numerical types.
Properties
- minValueForMeanInCol: lower bound
- maxValueForMeanInCol: upper bound
Behavior
| Condition | Status | 
|---|---|
| column mean value is between minValueForMeanInColandmaxValueForMeanInCol | Success ✅ | 
| column mean value is greater than minValueForMeanInColif onlyminValueForMeanInColis specified | Success ✅ | 
| column mean value is less then maxValueForMeanInColif onlymaxValueForMeanInColis specified | Success ✅ | 
| column mean value is not between minValueForMeanInColandmaxValueForMeanInCol | Failed ❌ | 
| column mean value is less than minValueForMeanInColif onlyminValueForMeanInColis specified | Failed ❌ | 
| column mean value is greater then maxValueForMeanInColif onlymaxValueForMeanInColis specified | Failed ❌ | 
YAML Config
testDefinitionName: columnValueMeanToBeBetween
parameterValues:
    - name: minValueForMeanInCol
      value: 5
    - name: maxValueForMeanInCol
      value: 10
JSON Config
{
    "testDefinitionName": "columnValueMeanToBeBetween",
    "parameterValues": [
        {
            "name": "minValueForMeanInCol",
            "value": 5
        },
        {
            "name": "maxValueForMeanInCol",
            "value": 10
        }
    ]
}
Column Value Median to Be Between
Validate the median of a column is between a specific range
Only supports numerical types.
Properties
- minValueForMedianInCol: lower bound
- maxValueForMedianInCol: upper bound
Behavior
| Condition | Status | 
|---|---|
| column median value is between minValueForMedianInColandmaxValueForMedianInCol | Success ✅ | 
| column median value is greater than minValueForMedianInColif onlyminValueForMedianInColis specified | Success ✅ | 
| column median value is less then maxValueForMedianInColif onlymaxValueForMedianInColis specified | Success ✅ | 
| column median value is not between minValueForMedianInColandmaxValueForMedianInCol | Failed ❌ | 
| column median value is less than minValueForMedianInColif onlyminValueForMedianInColis specified | Failed ❌ | 
| column median value is greater then maxValueForMedianInColif onlymaxValueForMedianInColis specified | Failed ❌ | 
YAML Config
testDefinitionName: columnValueMedianToBeBetween
parameterValues:
    - name: minValueForMedianInCol
      value: 5
    - name: maxValueForMedianInCol
      value: 10
JSON Config
{
    "testDefinitionName": "columnValueMedianToBeBetween",
    "parameterValues": [
        {
            "name": "minValueForMedianInCol",
            "value": 5
        },
        {
            "name": "maxValueForMedianInCol",
            "value": 10
        }
    ]
}
Column Values Sum to Be Between
Validate the sum of a column is between a specific range
Only supports numerical types.
Properties
- minValueForColSum: lower bound
- maxValueForColSum: upper bound
Behavior
| Condition | Status | 
|---|---|
| Sum of the column values is between minValueForColSumandmaxValueForColSum | Success ✅ | 
| Sum of the column values is greater than minValueForColSumif onlyminValueForColSumis specified | Success ✅ | 
| Sum of the column values is less then maxValueForColSumif onlymaxValueForColSumis specified | Success ✅ | 
| Sum of the column values is not between minValueForColSumandmaxValueForColSum | Failed ❌ | 
| Sum of the column values is less than minValueForColSumif onlyminValueForColSumis specified | Failed ❌ | 
| Sum of the column values is greater then maxValueForColSumif onlymaxValueForColSumis specified | Failed ❌ | 
YAML Config
testDefinitionName: columnValueMedianToBeBetween
parameterValues:
    - name: minValueForMedianInCol
      value: 5
    - name: maxValueForMedianInCol
      value: 10
JSON Config
{
    "testDefinitionName": "columnValueMedianToBeBetween",
    "parameterValues": [
        {
            "name": "minValueForMedianInCol",
            "value": 5
        },
        {
            "name": "maxValueForMedianInCol",
            "value": 10
        }
    ]
}
Column Values Standard Deviation to Be Between
Validate the standard deviation of a column is between a specific range
Only supports numerical types.
Properties
- minValueForStdDevInCol: lower bound
- minValueForStdDevInCol: upper bound
Behavior
| Condition | Status | 
|---|---|
| column values standard deviation is between minValueForStdDevInColandminValueForStdDevInCol | Success ✅ | 
| column values standard deviation is greater than minValueForStdDevInColif onlyminValueForStdDevInColis specified | Success ✅ | 
| column values standard deviation is less then minValueForStdDevInColif onlyminValueForStdDevInColis specified | Success ✅ | 
| column values standard deviation is not between minValueForStdDevInColandminValueForStdDevInCol | Failed ❌ | 
| column values standard deviation is less than minValueForStdDevInColif onlyminValueForStdDevInColis specified | Failed ❌ | 
| column values standard deviation is greater then minValueForStdDevInColif onlyminValueForStdDevInColis specified | Failed ❌ | 
YAML Config
testDefinitionName: columnValueStdDevToBeBetween
parameterValues:
    - name: minValueForStdDevInCol
      value: 5
    - name: maxValueForStdDevInCol
      value: 10
JSON Config
{
    "testDefinitionName": "columnValueStdDevToBeBetween",
    "parameterValues": [
        {
            "name": "minValueForStdDevInCol",
            "value": 5
        },
        {
            "name": "maxValueForStdDevInCol",
            "value": 10
        }
    ]
}
