Database Filter Patterns Docs (#7901)

* Filter Pattern Docs

* Database Filter Patterns

* Resolve merge conflicts
This commit is contained in:
Mayur Singal 2022-10-06 18:50:30 +05:30 committed by GitHub
parent d29721ab3d
commit f3e03b791c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
13 changed files with 455 additions and 0 deletions

View File

@ -447,6 +447,11 @@ site_menu:
- category: Connectors / Ingestion / Workflows/ Metadata / DBT / Ingest Owner from DBT
url: /connectors/ingestion/workflows/metadata/dbt/ingest-dbt-owner
- category: Connectors / Ingestion / Workflows/ Metadata / Filter Patterns
url: /connectors/ingestion/workflows/metadata/filter-patterns
- category: Connectors / Ingestion / Workflows/ Metadata / Filter Patterns / Database
url: /connectors/ingestion/workflows/metadata/filter-patterns/database
- category: Connectors / Ingestion / Workflows / Usage
url: /connectors/ingestion/workflows/usage
- category: Connectors / Ingestion / Workflows / Usage / Usage Workflow Through Query Logs
@ -490,6 +495,7 @@ site_menu:
url: /how-to-guides/cli-ingestion-with-basic-auth
- category: Features
url: /openmetadata
color: violet-70

View File

@ -0,0 +1,423 @@
---
title: Database Filter Patterns
slug: /connectors/ingestion/workflows/metadata/filter-patterns/database
---
# Database Filter Patterns
## Configuring Filters
One can configure the metadata ingestion filter for database source using four configuration fields which are `Database Filter Pattern`,
`Schema Filter Pattern`, `Table Filter Pattern` & `Use FQN For Filtering`. In this documnet we will learn about each field in detail
along with many examples.
<Collapse title="Configuring Filters via UI">
Filters can be configured in UI while adding an ingestion pipeline through `Add Metadata Ingestion` page.
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-patterns.png"
alt="Database Filter Pattern Fields"
caption="Database Filter Pattern Fields"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI">
Filters can be configured in CLI in connection configuration within `source.sourceConfig.config` field as described below.
```yaml
sourceConfig:
config:
...
useFqnForFiltering: false
databaseFilterPattern:
includes:
- database1
- database2
excludes:
- database3
- database4
schemaFilterPattern:
includes:
- schema1
- schema2
excludes:
- schema3
- schema4
tableFilterPattern:
includes:
- table1
- table2
excludes:
- table3
- table4
```
</Collapse>
### Use FQN For Filtering
This flag set when you want to apply the filter on fully qualified name (e.g service_name.db_name.schema_name.table_name)
instead of applying the filter to raw name of entity (e.g table_name). This Flag is useful in scenario when you have schema
with same name in different databases or table with same name in different schemas and you want to filter out one of them. This will be exaplined further in detail in this document.
### Database Filter Pattern
Database filter patterns to control whether or not to include database as part of metadata ingestion.
- **Include**: Explicitly include databases by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all databases with names matching one or more of the supplied regular expressions. All other databases will be excluded.
- **Exclude**: Explicitly exclude databases by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all databases with names matching one or more of the supplied regular expressions. All other databases will be included.
#### Example 1
```yaml
Snowflake_Prod # Snowflake Service Name
└─── SNOWFLAKE # DB Name
└─── SNOWFLAKE_SAMPLE_DATA # DB Name
└─── TEST_SNOWFLAKEDB # DB Name
└─── DUMMY_DB # DB Name
└─── ECOMMERCE # DB Name
```
Let's say we want to ingest metadata from a snowflake instance which contains multiple databases as described above.
In this example we want to ingest all databases which contains `SNOWFLAKE` in name, then the fillter pattern
appied would be `.*SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`
and `TEST_SNOWFLAKEDB`.
<Collapse title="Configuring Filters via UI for Example 1">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-1.png"
alt="Database Filter Pattern Example 1"
caption="Database Filter Pattern Example 1"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: false
databaseFilterPattern:
includes:
- .*SNOWFLAKE.*
```
</Collapse>
#### Example 2
In this example we want to ingest all databases which starts with `SNOWFLAKE` in name, then the fillter pattern
appied would be `^SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE` & `SNOWFLAKE_SAMPLE_DATA`.
<Collapse title="Configuring Filters via UI for Example 2">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-2.png"
alt="Database Filter Pattern Example 2"
caption="Database Filter Pattern Example 2"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: false
databaseFilterPattern:
includes:
- ^SNOWFLAKE.*
```
</Collapse>
#### Example 3
In this example we want to ingest all databases for which the name starts with `SNOWFLAKE` OR ends with `DB` , then the fillter pattern
appied would be `^SNOWFLAKE` & `DB$` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`, `TEST_SNOWFLAKEDB` & `DUMMY_DB`.
<Collapse title="Configuring Filters via UI for Example 3">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-3.png"
alt="Database Filter Pattern Example 3"
caption="Database Filter Pattern Example 3"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 3">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: false
databaseFilterPattern:
includes:
- ^SNOWFLAKE.*
- .*DB$
```
</Collapse>
#### Example 4
In this example we want to ingest only the `SNOWFLAKE` database then the fillter pattern appied would be `^SNOWFLAKE$`.
<Collapse title="Configuring Filters via UI for Example 4">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-4.png"
alt="Database Filter Pattern Example 4"
caption="Database Filter Pattern Example 4"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 4">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: false
databaseFilterPattern:
includes:
- ^SNOWFLAKE$
```
</Collapse>
### Schema Filter Pattern
Schema filter patterns are used to control whether or not to include schemas as part of metadata ingestion.
- **Include**: Explicitly include schemas by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all schemas with names matching one or more of the supplied regular expressions. All other schemas will be excluded.
- **Exclude**: Explicitly exclude schemas by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all schemas with names matching one or more of the supplied regular expressions. All other schemas will be included.
#### Example 1
```yaml
Snowflake_Prod # Snowflake Service Name
└─── SNOWFLAKE # DB Name
│ │
│ └─── PUBLIC # Schema Name
│ │
│ └─── TPCH_SF1 # Schema Name
│ │
│ └─── INFORMATION_SCHEMA # Schema Name
└─── SNOWFLAKE_SAMPLE_DATA # DB Name
│ │
│ └─── PUBLIC # Schema Name
│ │
│ └─── INFORMATION_SCHEMA # Schema Name
│ │
│ └─── TPCH_SF1 # Schema Name
│ │
│ └─── TPCH_SF10 # Schema Name
│ │
│ └─── TPCH_SF100 # Schema Name
```
In this example we want to ingest all schema winthin any database with name `PUBLIC`, then the schema fillter pattern
appied would be `^PUBLIC$` in the include field. This will result in ingestion of schemas `SNOWFLAKE.PUBLIC` & `SNOWFLAKE_SAMPLE_DATA.PUBLIC`
<Collapse title="Configuring Filters via UI for Example 1">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-1.png"
alt="Schema Filter Pattern Example 1"
caption="Schema Filter Pattern Example 1"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: false
schemaFilterPattern:
includes:
- ^PUBLIC$
```
</Collapse>
#### Example 2
In this example we want to ingest all schema winthin any database except schema with name `PUBLIC` available in `SNOWFLAKE_SAMPLE_DATA`.
Notice that we have two schemas availabale with name `PUBLIC` one is available in database `SNOWFLAKE_SAMPLE_DATA.PUBLIC` and other is `SNOWFLAKE.PUBLIC`. As per the constraint of this example all the schemas including `SNOWFLAKE.PUBLIC` but we need to skip `SNOWFLAKE_SAMPLE_DATA.PUBLIC`. to do that we will need to set `useFqnForFiltering` flag to true by doing this the filter pattern will be applied to fully qualified name instaed of raw table name. A fully qualified name(FQN) of schema is combination of service name, database name & schema name joined with `.`. In this example fully qualified name of the `SNOWFLAKE_SAMPLE_DATA.PUBLIC` schema will be `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC`, so we will need to apply a exculde filter pattern `^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.PUBLIC$` and set `useFqnForFiltering` to true.
<Collapse title="Configuring Filters via UI for Example 2">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-2.png"
alt="Schema Filter Pattern Example 2"
caption="Schema Filter Pattern Example 2"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: true
schemaFilterPattern:
excludes:
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.PUBLIC$
```
</Collapse>
#### Example 3
In this example we want to ingest `SNOWFLAKE.PUBLIC` & all the schemas in `SNOWFLAKE_SAMPLE_DATA` that starts with `TPCH_` i.e `SNOWFLAKE_SAMPLE_DATA.TPCH_1`, `SNOWFLAKE_SAMPLE_DATA.TPCH_10` & `SNOWFLAKE_SAMPLE_DATA.TPCH_100`. To achive this an include schema filter will be applied with pattern `^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$` & `^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*`, we need to set `useFqnForFiltering` as true as we want to apply filter on FQN.
<Collapse title="Configuring Filters via UI for Example 3">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-3.png"
alt="Schema Filter Pattern Example 3"
caption="Schema Filter Pattern Example 3"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 3">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: true
schemaFilterPattern:
includes:
- ^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*
```
</Collapse>
### Table Filter Pattern
Table filter patterns are used to control whether or not to include tables as part of metadata ingestion.
- **Include**: Explicitly include tables by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded.
- **Exclude**: Explicitly exclude tables by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included.
#### Example 1
```yaml
Snowflake_Prod # Snowflake Service Name
└─── SNOWFLAKE_SAMPLE_DATA # DB Name
│ │
│ └─── PUBLIC # Schema Name
│ │ │
│ │ └─── CUSTOMER # Table Name
│ │ │
│ │ └─── CUSTOMER_ADDRESS # Table Name
│ │ │
│ │ └─── CUSTOMER_DEMOGRAPHICS # Table Name
│ │ │
│ │ └─── CALL_CENTER # Table Name
│ │
│ └─── INFORMATION # Schema Name
│ │
│ └─── ORDERS # Table Name
│ │
│ └─── REGION # Table Name
│ │
│ └─── CUSTOMER # Table Name
└─── SNOWFLAKE # DB Name
└─── PUBLIC # Schema Name
└─── CUSTOMER # Table Name
```
#### Example 1
In this example we want to ingest table with name `CUSTOMER` within any schema and database. In this case we need to apply include table filter pattern `^CUSTOMER$`. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER`, `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.INFORMATION.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
<Collapse title="Configuring Filters via UI for Example 1">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/table-filter-example-1.png"
alt="Table Filter Pattern Example 1"
caption="Table Filter Pattern Example 1"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: true
tableFilterPattern:
includes:
- ^CUSTOMER$
```
</Collapse>
#### Example 2
In this example we want to ingest table with name `CUSTOMER` within `PUBLIC` schema of any database. In this case we need to apply include table filter pattern `.*\.PUBLIC\.CUSTOMER$` this will also require to set the `useFqnForFiltering` flag as true as we want to apply filter on FQN. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
<Collapse title="Configuring Filters via UI for Example 2">
<Image
src="/images/openmetadata/ingestion/workflows/metadata/filter-patterns/table-filter-example-2.png"
alt="Table Filter Pattern Example 2"
caption="Table Filter Pattern Example 2"
/>
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
```yaml
sourceConfig:
config:
...
useFqnForFiltering: true
tableFilterPattern:
includes:
- .*\.PUBLIC\.CUSTOMER$
```
</Collapse>

View File

@ -0,0 +1,26 @@
---
title: Metadata Ingestion Filter Patterns
slug: /connectors/ingestion/workflows/metadata/filter-patterns
---
# Metadata Ingestion Filter Patterns
The ingestion filter patterns are very useful when you have a lot of metadata available in your data source but
some metadata might not be useful or relevent to produce any insights or discover data for ex. you might want to
filter out the log tables while ingesting metadata.
Configuring these metadata filters with OpenMetadata is very easy, which uses regex for matching and filtering the metadata.
Following documents will guide you on how to configure filters based on the type of data source
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
bold="Database Filter Patterns"
icon="cable"
href="/connectors/ingestion/workflows/metadata/filter-patterns/database"
>
Learn more about how to configure filters for database sources.
</InlineCallout>
</InlineCalloutContainer>

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 548 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 358 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB