mirror of
				https://github.com/open-metadata/OpenMetadata.git
				synced 2025-10-26 16:22:09 +00:00 
			
		
		
		
	 c512ad6bc1
			
		
	
	
		c512ad6bc1
		
			
		
	
	
	
	
		
			
			* Add glossaries * GE + Prefect integration * Update data-quality * Minor typo in JSON code
		
			
				
	
	
		
			116 lines
		
	
	
		
			3.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			116 lines
		
	
	
		
			3.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Metrics
 | |
| slug: /openmetadata/data-quality/metrics
 | |
| ---
 | |
| 
 | |
| # Metrics
 | |
| 
 | |
| Here you can find information about the supported metrics for the different types.
 | |
| 
 | |
| A Metric is a computation that we can run on top of a Table or Column to receive a value back. They are the primary **building block** of OpenMetadata's Profiler.
 | |
| 
 | |
| * **Metrics** define the queries and computations generically. They do not aim at specific columns or database dialects. Instead, they are expressions built with SQLAlchemy that should run everywhere.
 | |
| * A **Profiler** is the binding between a set of metrics and the external world. The Profiler contains the Table and Session information and is in charge of executing the metrics.
 | |
| * A **Test Case** adds logic to the Metrics results. A Metric is neither good nor wrong, so we need the Test definitions to map results into Success or Failures.
 | |
| 
 | |
| On this page, you will learn all the metrics that we currently support and their meaning. We will base all the namings on the definitions on the JSON Schemas.
 | |
| 
 | |
| <Note>
 | |
| 
 | |
| You can check the definition of the `columnProfile` [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/data/table.json#L271). On the other hand, the metrics are implemented [here](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/src/metadata/orm\_profiler/metrics).
 | |
| 
 | |
| </Note>
 | |
| 
 | |
| We will base all the namings on the definitions on the JSON Schemas.
 | |
| 
 | |
| ## Table Metrics
 | |
| 
 | |
| Those are the metrics computed at the Table level.
 | |
| 
 | |
| ### Row Count
 | |
| 
 | |
| It computes the number of rows in the Table.
 | |
| 
 | |
| ### Column Count
 | |
| 
 | |
| Returns the number of columns in the Table.
 | |
| 
 | |
| ## Column Metrics
 | |
| 
 | |
| List of Metrics that we run for all the columns.
 | |
| 
 | |
| > Note that for now we are not supporting complex types such as ARRAY or STRUCT. The implementation will come down the road.
 | |
| 
 | |
| ### Values Count
 | |
| 
 | |
| It is the total count of the values in the column. Ignores nulls.
 | |
| 
 | |
| ### Values Percentage
 | |
| 
 | |
| Percentage of values in this column vs. the Row Count.
 | |
| 
 | |
| ### Duplicate Count
 | |
| 
 | |
| Informs the number of rows that have duplicated values in a column. We compute it as `count(col) - count(distinct(col))`.
 | |
| 
 | |
| ### Null Count
 | |
| 
 | |
| The number of null values in a column.
 | |
| 
 | |
| ### Null Proportion
 | |
| 
 | |
| It shows the ratio of null values vs. the total number of values in a column.
 | |
| 
 | |
| ### Unique Count
 | |
| 
 | |
| The number of unique values in a column, those that appear only once. E.g., `[1, 2, 2, 3, 3, 4] => [1, 4] => count = 2`.
 | |
| 
 | |
| ### Unique Proportion
 | |
| 
 | |
| Unique Count / Values Count
 | |
| 
 | |
| ### Distinct Count
 | |
| 
 | |
| The number of different items in a column. E.g., `[1, 2, 2, 3, 3, 4] => [1, 2, 3, 4] => count = 4`.
 | |
| 
 | |
| ### Distinct Proportion
 | |
| 
 | |
| Distinct Count / Values Count
 | |
| 
 | |
| ### Min
 | |
| 
 | |
| Only for numerical values. Returns the minimum.
 | |
| 
 | |
| ### Max
 | |
| 
 | |
| Only for numerical values. Returns the maximum.
 | |
| 
 | |
| ### Min Length
 | |
| 
 | |
| Only for concatenable values. Returns the minimum length of the values in a column.
 | |
| 
 | |
| ### Max Length
 | |
| 
 | |
| Only for concatenable values. Returns the maximum length of the values in a column.
 | |
| 
 | |
| ### Mean
 | |
| 
 | |
| * Numerical values: returns the average of the values.
 | |
| * Concatenable values: returns the average length of the values.
 | |
| 
 | |
| ### Sum
 | |
| 
 | |
| Only for numerical values. Returns the sum of all values in a column.
 | |
| 
 | |
| ### Standard Deviation
 | |
| 
 | |
| Only for numerical values. Returns the standard deviation.
 | |
| 
 | |
| ### Histogram
 | |
| 
 | |
| The histogram returns a dictionary of the different bins and the number of values found for that bin.
 | |
| 
 | |
| ## Reach out!
 | |
| 
 | |
| Is there any metric you'd like to see? Open an [issue](https://github.com/open-metadata/OpenMetadata/issues/new/choose) or reach out on [Slack](https://slack.open-metadata.org).
 |