Imri Paran cdaa5c10af
[GEN-1996] feat(data-quality): use sampling config in data diff (#18532)
* feat(data-quality): use sampling config in data diff

- get the table profiling config
- use hashing to sample deterministically the same ids from each table
- use dirty-equals to assert results of stochastic processes

* - reverted missing md5
- added missing database service type

* - use a custom substr sql function

* fixed nounce

* added failure for mssql with sampling because it requires a larger change in the data-diff library

* fixed unit tests

* updated range for sampling
2024-11-11 10:07:23 +01:00

27 lines
676 B
Python

"""Models for the TableDiff test case"""
from typing import List, Optional
from pydantic import BaseModel
from metadata.generated.schema.entity.data.table import Column, TableProfilerConfig
from metadata.generated.schema.entity.services.databaseService import (
DatabaseServiceType,
)
class TableParameter(BaseModel):
serviceUrl: str
path: str
columns: List[Column]
database_service_type: DatabaseServiceType
class TableDiffRuntimeParameters(BaseModel):
table1: TableParameter
table2: TableParameter
keyColumns: List[str]
extraColumns: List[str]
whereClause: Optional[str]
table_profile_config: Optional[TableProfilerConfig]