> The **Column Assertions** feature is available as part of the **DataHub Cloud Observe** module of DataHub Cloud.
> If you are interested in learning more about **DataHub Cloud Observe** or trying it out, please [visit our website](https://datahub.com/products/data-observability/).
Can you remember a time when an important warehouse table column changed dramatically, with little or no notice? Perhaps the number of null values suddenly spiked, or a new value was added to a fixed set of possible values. If the answer is yes, how did you initially find out? We'll take a guess - someone looking at an internal reporting dashboard or worse, a user using your your product, sounded an alarm when a number looked a bit out of the ordinary.
There are many reasons why important columns in your Snowflake, Redshift, BigQuery, or Databricks tables may change - application code bugs, new feature rollouts, etc. Oftentimes, these changes break important assumptions made about the data used in building key downstream data products like reporting dashboards or data-driven product features.
What if you could reduce the time to detect these incidents, so that the people responsible for the data were made aware of data issues before anyone else? With DataHub Cloud Column Assertions, you can.
With DataHub Cloud, you can define **Column Value** assertions to ensure each value in a column matches specific constraints, and **Column Metric** assertions to ensure that computed metrics from columns align with your expectations. As soon as things go wrong, your team will be the first to know, before the data issue becomes a larger data incident.
In this guide, we'll cover the basics of Column Assertions - what they are, how to configure them, and more - so that you and your team can start building trust in your most important data assets.
Column Assertions can be particularly useful for documenting and enforcing column-level "contracts", i.e. formal specifications about the expected contents of a particular column that can be used for coordinating among producers and consumers of the data.
For **Column Metric Assertions**, you will be able to choose from a list of common column metrics - MAX, MIN, MEAN, NULL COUNT, etc - and then compare these metric values to an expected value. The list of metrics will vary based on the type of the selected column. For example
if you've selected a numeric column, you can choose to compute the MEAN value of the column, and then assert that it is greater than a
specific number. For string types, you can choose to compute the MAX LENGTH of the string across all column values, and then assert that it
is less than a specific number.
#### 4. Row Selection Set
The **Row Selection Set**: This defines which rows in the table the Column Assertion will be evaluated across. You can choose
from the following options:
- **All Table Rows**: Evaluate the Column Assertion across all rows in the table. This is the default option. Note that
evaluation of the assertion. If you choose this option, you will need to specify a **High Watermark Column** to help determine which rows
have changed. A **High Watermark Column** is a column that contains a constantly incrementing value - a date, a time, or
another always-increasing number - that can be used to find the "new rows" that were added since previous evaluation. When selected, a query will be issued to the table to find only the rows that have changed since the previous assertion evaluation.
2. (Optional) **Data Platform Connection**: In order to create a Column Assertion that queries the data source directly (instead of DataHub metadata), you'll need to have an **Ingestion Source**
5. Configure the evaluation **schedule**. This is the frequency at which the assertion will be evaluated to produce a
pass or fail result, and the times when the column values will be checked.
6. Configure the **column assertion type**. You can choose from **Column Value** or **Column Metric**.
**Column Value** assertions are used to monitor the value of a specific column in a table, and ensure that every row
adheres to a specific condition. **Column Metric** assertions are used to compute a metric for that column, and then compare the value of that metric to your expectations.
another always-increasing number. When selected, a query will be issued to the table find only the rows which have changed since the last assertion run.
As part of the **DataHub Cloud Observe** module, DataHub Cloud also provides [Smart Assertions](./smart-assertions.md) out of the box. These are dynamic, AI-powered Column Metric Assertions that you can use to monitor anomalies on column metrics of important warehouse Tables, without requiring any manual setup.
You can create smart assertions by simply selecting the column and the metric you wish to monitor, and then clicking the `Detect with AI` option in the UI:
_Coming soon: we're making it easier to create Smart Assertions for multiple fields on a table, across multiple metrics, all in one go. If you're interested in this today, please let your DataHub representative know._
- **Assertion**: The specific expectation for the column metric. e.g. "The value of an integer column is greater than 10 for all rows in the table." This is the "what".
- **Monitor**: The process responsible for evaluating the Assertion on a given evaluation schedule and using specific
mechanisms. This is the "how".
Note that to create or delete Assertions and Monitors for a specific entity on DataHub, you'll need the
`Edit Assertions` and `Edit Monitors` privileges for it.