fix(ingest): postgres - ignore information_schema tables by default (#4069)

This commit is contained in:
Kevin Hu 2022-02-10 02:20:25 -05:00 committed by GitHub
parent 076848ff55
commit 9bdc9af7b9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 8 additions and 6 deletions

View File

@ -16,10 +16,10 @@ This plugin extracts the following:
- database_alias (optional) can be used to change the name of database to be ingested
- Table, row, and column statistics via optional [SQL profiling](./sql_profiles.md)
| Capability | Status | Details |
|-------------------|--------|------------------------------------------|
| Data Containers | ✔️ | |
| Data Domains | ✔️ | [link](../../docs/domains.md) |
| Capability | Status | Details |
| --------------- | ------ | ----------------------------- |
| Data Containers | ✔️ | |
| Data Domains | ✔️ | [link](../../docs/domains.md) |
## Quickstart recipe
@ -53,10 +53,10 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
As a SQL-based service, the Athena integration is also supported by our SQL profiler. See [here](./sql_profiles.md) for more details on configuration.
| Field | Required | Default | Description |
|--------------------------------|----------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ------------------------------ | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `username` | | | PostgreSQL username. |
| `password` | | | PostgreSQL password. |
| `host_port` | ✅ | | PostgreSQL host URL. |
| `host_port` | ✅ | | PostgreSQL host URL. |
| `database` | | | PostgreSQL database. |
| `database_alias` | | | Alias to apply to database when ingesting. |
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |

View File

@ -9,6 +9,7 @@ import sqlalchemy.dialects.postgresql as custom_types
# https://geoalchemy-2.readthedocs.io/en/latest/core_tutorial.html#reflecting-tables.
from geoalchemy2 import Geometry # noqa: F401
from datahub.configuration.common import AllowDenyPattern
from datahub.ingestion.source.sql.sql_common import (
BasicSQLAlchemyConfig,
SQLAlchemySource,
@ -29,6 +30,7 @@ register_custom_type(custom_types.HSTORE, MapTypeClass)
class PostgresConfig(BasicSQLAlchemyConfig):
# defaults
scheme = "postgresql+psycopg2"
schema_pattern = AllowDenyPattern(deny=["information_schema"])
def get_identifier(self: BasicSQLAlchemyConfig, schema: str, table: str) -> str:
regular = f"{schema}.{table}"