mirror of
https://github.com/datahub-project/datahub.git
synced 2025-07-13 20:21:33 +00:00
34 lines
1.1 KiB
Markdown
34 lines
1.1 KiB
Markdown
# SQL Profiling
|
|
|
|
SQL Profiling collects table level and column level statistics.
|
|
The SQL-based profiler does not run alone, but rather can be enabled for other SQL-based sources.
|
|
Enabling profiling will slow down ingestion runs.
|
|
|
|
:::caution
|
|
|
|
Running profiling against many tables or over many rows can run up significant costs.
|
|
While we've done our best to limit the expensiveness of the queries the profiler runs, you
|
|
should be prudent about the set of tables profiling is enabled on or the frequency
|
|
of the profiling runs.
|
|
|
|
:::
|
|
|
|
## Capabilities
|
|
|
|
Extracts:
|
|
|
|
- Row and column counts for each table
|
|
- For each column, if applicable:
|
|
- null counts and proportions
|
|
- distinct counts and proportions
|
|
- minimum, maximum, mean, median, standard deviation, some quantile values
|
|
- histograms or frequencies of unique values
|
|
|
|
## Supported Sources
|
|
|
|
SQL profiling is supported for all SQL sources. Check the individual source page to verify if it supports profiling.
|
|
|
|
## Questions
|
|
|
|
If you've got any questions on configuring profiling, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
|