Docs - Credentials, links and Roadmap (#6870)

* Managing credentials

* GCS creds info

* Links and roadmap update

* Add lineage in the menu
This commit is contained in:
Pere Miquel Brull 2022-08-23 11:53:40 +02:00 committed by GitHub
parent 95b2ac276e
commit ceb9601c67
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 232 additions and 26 deletions

View File

@ -392,6 +392,9 @@ site_menu:
- category: OpenMetadata / Connectors / Metadata / Amundsen
url: /openmetadata/connectors/metadata/amundsen
- category: OpenMetadata / Connectors / Managing Credentials
url: /openmetadata/connectors/credentials
- category: OpenMetadata / Ingestion
url: /openmetadata/ingestion
- category: OpenMetadata / Ingestion / Workflows
@ -403,13 +406,15 @@ site_menu:
url: /openmetadata/ingestion/workflows/metadata/dbt
- category: OpenMetadata / Ingestion / Workflows/ Metadata / DBT / Ingest DBT UI
url: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-ui
- category: OpenMetadata / Ingestion / Workflows/ Metadata / DBT / Ingest DBT CLI
url: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-cli
- category: OpenMetadata / Ingestion / Workflows/ Metadata / DBT / Ingest DBT from Workflow Config
url: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-workflow-config
- category: OpenMetadata / Ingestion / Workflows / Usage
url: /openmetadata/ingestion/workflows/usage
- category: OpenMetadata / Ingestion / Workflows / Usage / Usage Workflow Through Query Logs
url: /openmetadata/ingestion/workflows/usage/usage-workflow-query-logs
- category: OpenMetadata / Ingestion / Workflows / Lineage
url: /openmetadata/ingestion/workflows/lineage
- category: OpenMetadata / Ingestion / Workflows / Profiler
url: /openmetadata/ingestion/workflows/profiler
- category: OpenMetadata / Ingestion / Workflows / Profiler / Metrics

View File

@ -0,0 +1,66 @@
---
title: Managing Credentials
slug: /openmetadata/connectors/credentials
---
# Manging Credentials in the CLI
When running Workflow with the CLI or your favourite scheduler, it's safer to not have the services' credentials
at plain sight. For the CLI, the ingestion package can load sensitive information from environment variables.
For example, if you are using the [Glue](/openmetadata/connectors/database/glue) connector you could specify the
AWS configurations as follows in the case of a JSON config file
```json
[...]
"awsConfig": {
"awsAccessKeyId": "${AWS_ACCESS_KEY_ID}",
"awsSecretAccessKey": "${AWS_SECRET_ACCESS_KEY}",
"awsRegion": "${AWS_REGION}",
"awsSessionToken": "${AWS_SESSION_TOKEN}"
},
[...]
```
Or
```yaml
[...]
awsConfig:
awsAccessKeyId: '${AWS_ACCESS_KEY_ID}'
awsSecretAccessKey: '${AWS_SECRET_ACCESS_KEY}'
awsRegion: '${AWS_REGION}'
awsSessionToken: '${AWS_SESSION_TOKEN}'
[...]
```
for a YAML configuration.
# AWS Credentials
The AWS Credentials are based on the following [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/security/credentials/awsCredentials.json).
Note that the only required field is the `awsRegion`. This configuration is rather flexible to allow installations under AWS
that directly use instance roles for permissions to authenticate to whatever service we are pointing to without having to
write the credentials down.
## AWS Vault
If using [aws-vault](https://github.com/99designs/aws-vault), it gets a bit more involved to run the CLI ingestion as the credentials are not globally available in the terminal.
In that case, you could use the following command after setting up the ingestion configuration file:
```bash
aws-vault exec <role> -- $SHELL -c 'metadata ingest -c <path to connector>'
```
# GCS Credentials
The GCS Credentials are based on the following [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/security/credentials/gcsCredentials.json).
These are the fields that you can export when preparing a Service Account.
Once the account is created, you can see the fields in the exported JSON file from:
```
IAM & Admin > Service Accounts > Keys
```
You can validate the whole Google service account setup [here](deployment/security/google).

View File

@ -5,10 +5,66 @@ slug: /openmetadata/ingestion
# Metadata Ingestion
Explain how we have different types of workflows and the metadata
that we can ingest automatically:
The goal of OpenMetadata is to serve as a centralised platform where users can gather and collaborate
around data. This is possible thanks for different workflows that users can deploy and schedule, which will
connect to the data sources to extract metadata.
- e.g., table metadata
- DBT
- Lineage
- Usage
Different metadata being ingested to OpenMetadata can be:
- Entities metadata, such as Tables, Dashboards, Topics...
- Query usage to rank the most used tables,
- Lineage between Entities,
- Data Profiles and Quality Tests.
In this section we will explore the different workflows, how they work and how to use them.
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
bold="Metadata Ingestion"
icon="cable"
href="/openmetadata/ingestion/workflows/metadata"
>
Learn more about how to ingest metadata from dozens of connectors.
</InlineCallout>
<InlineCallout
color="violet-70"
bold="Metadata Profiler & Quality Tests"
icon="cable"
href="/openmetadata/ingestion/workflows/profiler"
>
Get metrics from your Tables and run automated Quality Tests!
</InlineCallout>
<InlineCallout
color="violet-70"
bold="Metadata Usage"
icon="cable"
href="/openmetadata/ingestion/workflows/usage"
>
To analyze popular entities.
</InlineCallout>
<InlineCallout
color="violet-70"
bold="Metadata Lineage"
icon="cable"
href="/openmetadata/ingestion/workflows/lineage"
>
To analyze relationships in your data platform.
</InlineCallout>
</InlineCalloutContainer>
## Metadata Versioning
One fundamental aspect of Metadata Ingestion is being able to analyze the evolution of your metadata. OpenMetadata
support Metadata Versioning, maintaining the history of changes of all your assets.
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
bold="Metadata Versioning"
icon="360"
href="/openmetadata/ingestion/versioning"
>
Learn how OpenMetadata keeps track of your metadata evolution.
</InlineCallout>
</InlineCalloutContainer>

View File

@ -0,0 +1,8 @@
---
title: Lineage Workflow
slug: /openmetadata/ingestion/workflows/lineage
---
# Lineage Workflow
Introduced in 0.12

View File

@ -5,6 +5,27 @@ slug: /openmetadata/ingestion/workflows/metadata/dbt
# DBT Integration
You can ingest DBT Metadata both with the UI or by writing down your Workflow configuration:
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
bold="DBT UI ingestion"
icon="cable"
href="/openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-ui"
>
Configure the DBT ingestion directly in the UI.
</InlineCallout>
<InlineCallout
color="violet-70"
bold="DBT CLI ingestion"
icon="cable"
href="/openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-cli"
>
Prepare the DBT ingestion with the CLI or your favourite scheduler.
</InlineCallout>
</InlineCalloutContainer>
### What is DBT?
A DBT model provides transformation logic that creates a table from raw data.
@ -15,12 +36,12 @@ DBT does the T in [ELT](https://docs.getdbt.com/terms/elt) (Extract, Load, Trans
For information regarding setting up a DBT project and creating models please refer to the official DBT documentation [here](https://docs.getdbt.com/docs/introduction).
### DBT Integration in Openmetadata
### DBT Integration in OpenMetadata
OpenMetadata includes an integration for DBT that enables you to see what models are being used to generate tables.
Openmetadata parses the [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json) and [catalog](https://docs.getdbt.com/reference/artifacts/catalog-json) json files and shows the queries from which the models are being generated.
OpenMetadata parses the [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json) and [catalog](https://docs.getdbt.com/reference/artifacts/catalog-json) json files and shows the queries from which the models are being generated.
Metadata regarding the tables and views generated via DBT is also ingested and can be seen.
![gif](/images/openmetadata/ingestion/workflows/metadata/dbt-integration.gif)
![gif](/images/openmetadata/ingestion/workflows/metadata/dbt-integration.gif)

View File

@ -1,9 +1,9 @@
---
title: DBT Ingestion CLI
slug: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-cli
title: DBT Ingestion from Workflow config
slug: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-workflow-config
---
# Add DBT while ingesting from CLI
# Add DBT to your Workflow config
Provide and configure the DBT manifest and catalog file source locations.

View File

@ -4,3 +4,29 @@ slug: /openmetadata/ingestion/workflows/metadata
---
# Metadata Ingestion Workflow
The easiest way to extract metadata is to use any of our connectors!
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
bold="Metadata Connectors"
icon="add_moderator"
href="/openmetadata/connectors"
>
Configure your automated Metadata extraction.
</InlineCallout>
</InlineCalloutContainer>
If you want to learn more about how to extract metadata from DBT, we have you covered:
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
bold="DBT Ingestion"
icon="add_moderator"
href="/openmetadata/ingestion/workflows/metadata/dbt"
>
Extract Metadata and ingest your DBT models.
</InlineCallout>
</InlineCalloutContainer>

View File

@ -13,6 +13,21 @@ This workflow is available ONLY for the following connectors:
- [Redshift](/openmetadata/connectors/database/redshift)
- [Clickhouse](/openmetadata/connectors/database/clickhouse)
If your database service is not yet supported, you can use this same workflow by providing a Query Log file!
Learn how to do so 👇
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
bold="Usage Workflow through Query Logs"
icon="add_moderator"
href="/openmetadata/ingestion/workflows/usage/usage-workflow-query-logs"
>
Configure the usage workflow by providing a Query Log file.
</InlineCallout>
</InlineCalloutContainer>
## UI Configuration
Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Query Usage and Entity Lineage information.
@ -53,4 +68,4 @@ Set the limit for the query log results to be run at a time.
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the usage pipeline being added to the Service Ingestions.
<Image src="/images/openmetadata/ingestion/workflows/usage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
<Image src="/images/openmetadata/ingestion/workflows/usage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>

View File

@ -15,7 +15,7 @@ or ping us on [Slack](https://slack.open-metadata.org/) If you would like to pri
You can check the latest release [here](/overview/releases).
## 0.12.0 Release - Aug 17th, 2022
## 0.12.0 Release - Sept 7th, 2022
<TileContainer>
<Tile
@ -85,10 +85,9 @@ You can check the latest release [here](/overview/releases).
bordercolor="blue-70"
>
<li>Fivetran</li>
<li>Sagemaker</li>
<li>Mode</li>
<li>Redpanda</li>
<li>Prefect</li>
<li>Dagster</li>
</Tile>
<Tile
title="ML Features"
@ -98,7 +97,7 @@ You can check the latest release [here](/overview/releases).
/>
</TileContainer>
## 0.13.0 Release - Sept 28th, 2022
## 0.13.0 Release - Oct 12th, 2022
<TileContainer>
<Tile
@ -144,12 +143,10 @@ You can check the latest release [here](/overview/releases).
background="green-70"
bordercolor="green-70"
>
<li>Qwik</li>
<li>DataStudio</li>
<li>Trino Usage</li>
<li>LookML</li>
<li>Dagster</li>
<li>One click migration from Amundsen and Atlas.</li>
<li>Sagemaker</li>
</Tile>
<Tile
title="Data Quality"
@ -158,7 +155,7 @@ You can check the latest release [here](/overview/releases).
bordercolor="yellow-70"
link="https://github.com/open-metadata/OpenMetadata/issues/4652"
>
<li>Custom SQL improvements, Allow users to validate the sql and run</li>
<li>Complex types</li>
<li>Improvements to data profiler metrics</li>
<li>Performance improvements to data quality</li>
</Tile>
@ -179,13 +176,16 @@ You can check the latest release [here](/overview/releases).
/>
<Tile
title="Lineage"
text="Support Spark Lineage"
text=""
background="green-70"
bordercolor="green-70"
/>
>
<li>Spark Lineage</li>
<li>Connector Lineage improvements</li>
</Tile>
</TileContainer>
## 0.14.0 Release - Nov 9th, 2022
## 0.14.0 Release - Nov 16th, 2022
<TileContainer>
<Tile
@ -233,6 +233,15 @@ You can check the latest release [here](/overview/releases).
<li>Microstrategy</li>
<li>Custom service integration - Users can integrate with their own service type</li>
</Tile>
<Tile
title="Data Quality"
text=""
background="purple-70"
bordercolor="purple-70"
link=""
>
<li>Custom SQL improvements, Allow users to validate the sql and run</li>
</Tile>
</TileContainer>
## 1.0 Release - Dec 15th, 2022