diff --git a/openmetadata-docs/content/menu.md b/openmetadata-docs/content/menu.md index b3ad07af145..104d1320482 100644 --- a/openmetadata-docs/content/menu.md +++ b/openmetadata-docs/content/menu.md @@ -392,6 +392,9 @@ site_menu: - category: OpenMetadata / Connectors / Metadata / Amundsen url: /openmetadata/connectors/metadata/amundsen + - category: OpenMetadata / Connectors / Managing Credentials + url: /openmetadata/connectors/credentials + - category: OpenMetadata / Ingestion url: /openmetadata/ingestion - category: OpenMetadata / Ingestion / Workflows @@ -403,13 +406,15 @@ site_menu: url: /openmetadata/ingestion/workflows/metadata/dbt - category: OpenMetadata / Ingestion / Workflows/ Metadata / DBT / Ingest DBT UI url: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-ui - - category: OpenMetadata / Ingestion / Workflows/ Metadata / DBT / Ingest DBT CLI - url: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-cli + - category: OpenMetadata / Ingestion / Workflows/ Metadata / DBT / Ingest DBT from Workflow Config + url: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-workflow-config - category: OpenMetadata / Ingestion / Workflows / Usage url: /openmetadata/ingestion/workflows/usage - category: OpenMetadata / Ingestion / Workflows / Usage / Usage Workflow Through Query Logs url: /openmetadata/ingestion/workflows/usage/usage-workflow-query-logs + - category: OpenMetadata / Ingestion / Workflows / Lineage + url: /openmetadata/ingestion/workflows/lineage - category: OpenMetadata / Ingestion / Workflows / Profiler url: /openmetadata/ingestion/workflows/profiler - category: OpenMetadata / Ingestion / Workflows / Profiler / Metrics diff --git a/openmetadata-docs/content/openmetadata/connectors/credentials/index.md b/openmetadata-docs/content/openmetadata/connectors/credentials/index.md new file mode 100644 index 00000000000..93703e8aefc --- /dev/null +++ b/openmetadata-docs/content/openmetadata/connectors/credentials/index.md @@ -0,0 +1,66 @@ +--- +title: Managing Credentials +slug: /openmetadata/connectors/credentials +--- + +# Manging Credentials in the CLI + +When running Workflow with the CLI or your favourite scheduler, it's safer to not have the services' credentials +at plain sight. For the CLI, the ingestion package can load sensitive information from environment variables. + +For example, if you are using the [Glue](/openmetadata/connectors/database/glue) connector you could specify the +AWS configurations as follows in the case of a JSON config file + +```json +[...] +"awsConfig": { + "awsAccessKeyId": "${AWS_ACCESS_KEY_ID}", + "awsSecretAccessKey": "${AWS_SECRET_ACCESS_KEY}", + "awsRegion": "${AWS_REGION}", + "awsSessionToken": "${AWS_SESSION_TOKEN}" +}, +[...] +``` + +Or + +```yaml +[...] +awsConfig: + awsAccessKeyId: '${AWS_ACCESS_KEY_ID}' + awsSecretAccessKey: '${AWS_SECRET_ACCESS_KEY}' + awsRegion: '${AWS_REGION}' + awsSessionToken: '${AWS_SESSION_TOKEN}' +[...] +``` + +for a YAML configuration. + +# AWS Credentials + +The AWS Credentials are based on the following [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/security/credentials/awsCredentials.json). +Note that the only required field is the `awsRegion`. This configuration is rather flexible to allow installations under AWS +that directly use instance roles for permissions to authenticate to whatever service we are pointing to without having to +write the credentials down. + +## AWS Vault + +If using [aws-vault](https://github.com/99designs/aws-vault), it gets a bit more involved to run the CLI ingestion as the credentials are not globally available in the terminal. +In that case, you could use the following command after setting up the ingestion configuration file: + +```bash +aws-vault exec -- $SHELL -c 'metadata ingest -c ' +``` + +# GCS Credentials + +The GCS Credentials are based on the following [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/security/credentials/gcsCredentials.json). +These are the fields that you can export when preparing a Service Account. + +Once the account is created, you can see the fields in the exported JSON file from: + +``` +IAM & Admin > Service Accounts > Keys +``` + +You can validate the whole Google service account setup [here](deployment/security/google). diff --git a/openmetadata-docs/content/openmetadata/ingestion/index.md b/openmetadata-docs/content/openmetadata/ingestion/index.md index 9e6495eb031..03bddf84f9a 100644 --- a/openmetadata-docs/content/openmetadata/ingestion/index.md +++ b/openmetadata-docs/content/openmetadata/ingestion/index.md @@ -5,10 +5,66 @@ slug: /openmetadata/ingestion # Metadata Ingestion -Explain how we have different types of workflows and the metadata -that we can ingest automatically: +The goal of OpenMetadata is to serve as a centralised platform where users can gather and collaborate +around data. This is possible thanks for different workflows that users can deploy and schedule, which will +connect to the data sources to extract metadata. -- e.g., table metadata -- DBT -- Lineage -- Usage +Different metadata being ingested to OpenMetadata can be: +- Entities metadata, such as Tables, Dashboards, Topics... +- Query usage to rank the most used tables, +- Lineage between Entities, +- Data Profiles and Quality Tests. + +In this section we will explore the different workflows, how they work and how to use them. + + + + Learn more about how to ingest metadata from dozens of connectors. + + + Get metrics from your Tables and run automated Quality Tests! + + + To analyze popular entities. + + + To analyze relationships in your data platform. + + + + +## Metadata Versioning + +One fundamental aspect of Metadata Ingestion is being able to analyze the evolution of your metadata. OpenMetadata +support Metadata Versioning, maintaining the history of changes of all your assets. + + + + Learn how OpenMetadata keeps track of your metadata evolution. + + diff --git a/openmetadata-docs/content/openmetadata/ingestion/workflows/lineage/index.md b/openmetadata-docs/content/openmetadata/ingestion/workflows/lineage/index.md new file mode 100644 index 00000000000..a1b3958f531 --- /dev/null +++ b/openmetadata-docs/content/openmetadata/ingestion/workflows/lineage/index.md @@ -0,0 +1,8 @@ +--- +title: Lineage Workflow +slug: /openmetadata/ingestion/workflows/lineage +--- + +# Lineage Workflow + +Introduced in 0.12 diff --git a/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/index.md b/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/index.md index 96ffca8805b..0862405bb68 100644 --- a/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/index.md +++ b/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/index.md @@ -5,6 +5,27 @@ slug: /openmetadata/ingestion/workflows/metadata/dbt # DBT Integration +You can ingest DBT Metadata both with the UI or by writing down your Workflow configuration: + + + + Configure the DBT ingestion directly in the UI. + + + Prepare the DBT ingestion with the CLI or your favourite scheduler. + + + ### What is DBT? A DBT model provides transformation logic that creates a table from raw data. @@ -15,12 +36,12 @@ DBT does the T in [ELT](https://docs.getdbt.com/terms/elt) (Extract, Load, Trans For information regarding setting up a DBT project and creating models please refer to the official DBT documentation [here](https://docs.getdbt.com/docs/introduction). -### DBT Integration in Openmetadata +### DBT Integration in OpenMetadata OpenMetadata includes an integration for DBT that enables you to see what models are being used to generate tables. -Openmetadata parses the [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json) and [catalog](https://docs.getdbt.com/reference/artifacts/catalog-json) json files and shows the queries from which the models are being generated. +OpenMetadata parses the [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json) and [catalog](https://docs.getdbt.com/reference/artifacts/catalog-json) json files and shows the queries from which the models are being generated. Metadata regarding the tables and views generated via DBT is also ingested and can be seen. -![gif](/images/openmetadata/ingestion/workflows/metadata/dbt-integration.gif) \ No newline at end of file +![gif](/images/openmetadata/ingestion/workflows/metadata/dbt-integration.gif) diff --git a/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-cli.md b/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-workflow-config.md similarity index 98% rename from openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-cli.md rename to openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-workflow-config.md index 3892eeda209..da7990b707b 100644 --- a/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-cli.md +++ b/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-workflow-config.md @@ -1,9 +1,9 @@ --- -title: DBT Ingestion CLI -slug: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-cli +title: DBT Ingestion from Workflow config +slug: /openmetadata/ingestion/workflows/metadata/dbt/ingest-dbt-workflow-config --- -# Add DBT while ingesting from CLI +# Add DBT to your Workflow config Provide and configure the DBT manifest and catalog file source locations. diff --git a/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/index.md b/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/index.md index 871e4024bcc..d9390708a02 100644 --- a/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/index.md +++ b/openmetadata-docs/content/openmetadata/ingestion/workflows/metadata/index.md @@ -4,3 +4,29 @@ slug: /openmetadata/ingestion/workflows/metadata --- # Metadata Ingestion Workflow + +The easiest way to extract metadata is to use any of our connectors! + + + + Configure your automated Metadata extraction. + + + +If you want to learn more about how to extract metadata from DBT, we have you covered: + + + + Extract Metadata and ingest your DBT models. + + diff --git a/openmetadata-docs/content/openmetadata/ingestion/workflows/usage/index.md b/openmetadata-docs/content/openmetadata/ingestion/workflows/usage/index.md index b24dfc1a02d..9efa59dc6c2 100644 --- a/openmetadata-docs/content/openmetadata/ingestion/workflows/usage/index.md +++ b/openmetadata-docs/content/openmetadata/ingestion/workflows/usage/index.md @@ -13,6 +13,21 @@ This workflow is available ONLY for the following connectors: - [Redshift](/openmetadata/connectors/database/redshift) - [Clickhouse](/openmetadata/connectors/database/clickhouse) +If your database service is not yet supported, you can use this same workflow by providing a Query Log file! + +Learn how to do so 👇 + + + + Configure the usage workflow by providing a Query Log file. + + + ## UI Configuration Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Query Usage and Entity Lineage information. @@ -53,4 +68,4 @@ Set the limit for the query log results to be run at a time. ### 3. Schedule and Deploy After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the usage pipeline being added to the Service Ingestions. -schedule-and-deploy \ No newline at end of file +schedule-and-deploy diff --git a/openmetadata-docs/content/overview/roadmap/index.md b/openmetadata-docs/content/overview/roadmap/index.md index 3c30c1be5ad..58b65cdc0d4 100644 --- a/openmetadata-docs/content/overview/roadmap/index.md +++ b/openmetadata-docs/content/overview/roadmap/index.md @@ -15,7 +15,7 @@ or ping us on [Slack](https://slack.open-metadata.org/) If you would like to pri You can check the latest release [here](/overview/releases). -## 0.12.0 Release - Aug 17th, 2022 +## 0.12.0 Release - Sept 7th, 2022
  • Fivetran
  • -
  • Sagemaker
  • Mode
  • Redpanda
  • -
  • Prefect
  • +
  • Dagster
  • -## 0.13.0 Release - Sept 28th, 2022 +## 0.13.0 Release - Oct 12th, 2022 -
  • Qwik
  • DataStudio
  • Trino Usage
  • LookML
  • -
  • Dagster
  • -
  • One click migration from Amundsen and Atlas.
  • +
  • Sagemaker
  • -
  • Custom SQL improvements, Allow users to validate the sql and run
  • +
  • Complex types
  • Improvements to data profiler metrics
  • Performance improvements to data quality
  • @@ -179,13 +176,16 @@ You can check the latest release [here](/overview/releases). /> + > +
  • Spark Lineage
  • +
  • Connector Lineage improvements
  • +
    -## 0.14.0 Release - Nov 9th, 2022 +## 0.14.0 Release - Nov 16th, 2022 Microstrategy
  • Custom service integration - Users can integrate with their own service type
  • + +
  • Custom SQL improvements, Allow users to validate the sql and run
  • +
    ## 1.0 Release - Dec 15th, 2022