diff --git a/docs/.gitbook/assets/add-yourself.png b/docs/.gitbook/assets/add-yourself.png new file mode 100644 index 00000000000..3f298c99ad1 Binary files /dev/null and b/docs/.gitbook/assets/add-yourself.png differ diff --git a/docs/.gitbook/assets/dashboards.png b/docs/.gitbook/assets/dashboards.png new file mode 100644 index 00000000000..e469553ce8d Binary files /dev/null and b/docs/.gitbook/assets/dashboards.png differ diff --git a/docs/.gitbook/assets/etl-description.png b/docs/.gitbook/assets/etl-description.png new file mode 100644 index 00000000000..7b05850374e Binary files /dev/null and b/docs/.gitbook/assets/etl-description.png differ diff --git a/docs/.gitbook/assets/fact-order-description.jpeg b/docs/.gitbook/assets/fact-order-description.jpeg new file mode 100644 index 00000000000..07f97bad164 Binary files /dev/null and b/docs/.gitbook/assets/fact-order-description.jpeg differ diff --git a/docs/.gitbook/assets/fact-order.png b/docs/.gitbook/assets/fact-order.png new file mode 100644 index 00000000000..fa098f2b58d Binary files /dev/null and b/docs/.gitbook/assets/fact-order.png differ diff --git a/docs/.gitbook/assets/fact-sale-description.jpeg b/docs/.gitbook/assets/fact-sale-description.jpeg new file mode 100644 index 00000000000..f10b4112aac Binary files /dev/null and b/docs/.gitbook/assets/fact-sale-description.jpeg differ diff --git a/docs/.gitbook/assets/fact-sale-description.png b/docs/.gitbook/assets/fact-sale-description.png new file mode 100644 index 00000000000..468457e40f4 Binary files /dev/null and b/docs/.gitbook/assets/fact-sale-description.png differ diff --git a/docs/.gitbook/assets/fact-sale-fields.png b/docs/.gitbook/assets/fact-sale-fields.png new file mode 100644 index 00000000000..5f830d362fb Binary files /dev/null and b/docs/.gitbook/assets/fact-sale-fields.png differ diff --git a/docs/.gitbook/assets/fact-sale.png b/docs/.gitbook/assets/fact-sale.png new file mode 100644 index 00000000000..5ec22409694 Binary files /dev/null and b/docs/.gitbook/assets/fact-sale.png differ diff --git a/docs/.gitbook/assets/frequently-joined-tables.png b/docs/.gitbook/assets/frequently-joined-tables.png new file mode 100644 index 00000000000..9877e9955b6 Binary files /dev/null and b/docs/.gitbook/assets/frequently-joined-tables.png differ diff --git a/docs/.gitbook/assets/link-fact-sale.png b/docs/.gitbook/assets/link-fact-sale.png new file mode 100644 index 00000000000..241caf1bddb Binary files /dev/null and b/docs/.gitbook/assets/link-fact-sale.png differ diff --git a/docs/.gitbook/assets/location.png b/docs/.gitbook/assets/location.png new file mode 100644 index 00000000000..365f50dd121 Binary files /dev/null and b/docs/.gitbook/assets/location.png differ diff --git a/docs/.gitbook/assets/log-in-with-google.png b/docs/.gitbook/assets/log-in-with-google.png new file mode 100644 index 00000000000..82636522e22 Binary files /dev/null and b/docs/.gitbook/assets/log-in-with-google.png differ diff --git a/docs/.gitbook/assets/login-select-teams.png b/docs/.gitbook/assets/login-select-teams.png new file mode 100644 index 00000000000..76428050b27 Binary files /dev/null and b/docs/.gitbook/assets/login-select-teams.png differ diff --git a/docs/.gitbook/assets/my-data.png b/docs/.gitbook/assets/my-data.png new file mode 100644 index 00000000000..e4e8c6684d5 Binary files /dev/null and b/docs/.gitbook/assets/my-data.png differ diff --git a/docs/.gitbook/assets/owner.png b/docs/.gitbook/assets/owner.png new file mode 100644 index 00000000000..ec87db822dd Binary files /dev/null and b/docs/.gitbook/assets/owner.png differ diff --git a/docs/.gitbook/assets/pipeline-description.png b/docs/.gitbook/assets/pipeline-description.png new file mode 100644 index 00000000000..7c4f8ceeb36 Binary files /dev/null and b/docs/.gitbook/assets/pipeline-description.png differ diff --git a/docs/.gitbook/assets/pipeline-visual.png b/docs/.gitbook/assets/pipeline-visual.png new file mode 100644 index 00000000000..0a0a5e7f319 Binary files /dev/null and b/docs/.gitbook/assets/pipeline-visual.png differ diff --git a/docs/.gitbook/assets/region.png b/docs/.gitbook/assets/region.png new file mode 100644 index 00000000000..9abe85f5455 Binary files /dev/null and b/docs/.gitbook/assets/region.png differ diff --git a/docs/.gitbook/assets/sales-search-v2.png b/docs/.gitbook/assets/sales-search-v2.png new file mode 100644 index 00000000000..1b011cb6efb Binary files /dev/null and b/docs/.gitbook/assets/sales-search-v2.png differ diff --git a/docs/.gitbook/assets/sales-search.png b/docs/.gitbook/assets/sales-search.png new file mode 100644 index 00000000000..d16dff4872d Binary files /dev/null and b/docs/.gitbook/assets/sales-search.png differ diff --git a/docs/.gitbook/assets/sample-data.png b/docs/.gitbook/assets/sample-data.png new file mode 100644 index 00000000000..d647120d7dc Binary files /dev/null and b/docs/.gitbook/assets/sample-data.png differ diff --git a/docs/.gitbook/assets/sandbox (1).png b/docs/.gitbook/assets/sandbox (1).png new file mode 100644 index 00000000000..6264385c1b8 Binary files /dev/null and b/docs/.gitbook/assets/sandbox (1).png differ diff --git a/docs/.gitbook/assets/sandbox.png b/docs/.gitbook/assets/sandbox.png new file mode 100644 index 00000000000..6264385c1b8 Binary files /dev/null and b/docs/.gitbook/assets/sandbox.png differ diff --git a/docs/.gitbook/assets/search-results-v2.png b/docs/.gitbook/assets/search-results-v2.png new file mode 100644 index 00000000000..b1d0eaba501 Binary files /dev/null and b/docs/.gitbook/assets/search-results-v2.png differ diff --git a/docs/.gitbook/assets/search-results.png b/docs/.gitbook/assets/search-results.png new file mode 100644 index 00000000000..c186b968b42 Binary files /dev/null and b/docs/.gitbook/assets/search-results.png differ diff --git a/docs/.gitbook/assets/select-owner.png b/docs/.gitbook/assets/select-owner.png new file mode 100644 index 00000000000..493efccb8e2 Binary files /dev/null and b/docs/.gitbook/assets/select-owner.png differ diff --git a/docs/.gitbook/assets/settings-tags-menu.png b/docs/.gitbook/assets/settings-tags-menu.png new file mode 100644 index 00000000000..d7e2847d1f1 Binary files /dev/null and b/docs/.gitbook/assets/settings-tags-menu.png differ diff --git a/docs/.gitbook/assets/sort-by-weekly-usage.png b/docs/.gitbook/assets/sort-by-weekly-usage.png new file mode 100644 index 00000000000..1679067de5f Binary files /dev/null and b/docs/.gitbook/assets/sort-by-weekly-usage.png differ diff --git a/docs/.gitbook/assets/sorted-by-weekly-usage (1).png b/docs/.gitbook/assets/sorted-by-weekly-usage (1).png new file mode 100644 index 00000000000..8b0abeb314b Binary files /dev/null and b/docs/.gitbook/assets/sorted-by-weekly-usage (1).png differ diff --git a/docs/.gitbook/assets/sorted-by-weekly-usage.png b/docs/.gitbook/assets/sorted-by-weekly-usage.png new file mode 100644 index 00000000000..ff872a9da54 Binary files /dev/null and b/docs/.gitbook/assets/sorted-by-weekly-usage.png differ diff --git a/docs/.gitbook/assets/tier-1-documentation-v2.png b/docs/.gitbook/assets/tier-1-documentation-v2.png new file mode 100644 index 00000000000..3d8e351379e Binary files /dev/null and b/docs/.gitbook/assets/tier-1-documentation-v2.png differ diff --git a/docs/.gitbook/assets/tier-1.png b/docs/.gitbook/assets/tier-1.png new file mode 100644 index 00000000000..29aca55a0ba Binary files /dev/null and b/docs/.gitbook/assets/tier-1.png differ diff --git a/docs/.gitbook/assets/tier-documentation-v2.png b/docs/.gitbook/assets/tier-documentation-v2.png new file mode 100644 index 00000000000..2a1bcd60aab Binary files /dev/null and b/docs/.gitbook/assets/tier-documentation-v2.png differ diff --git a/docs/.gitbook/assets/tier-documentation.png b/docs/.gitbook/assets/tier-documentation.png new file mode 100644 index 00000000000..7edacc738db Binary files /dev/null and b/docs/.gitbook/assets/tier-documentation.png differ diff --git a/docs/.gitbook/assets/tier1-results.png b/docs/.gitbook/assets/tier1-results.png new file mode 100644 index 00000000000..9e650612fd3 Binary files /dev/null and b/docs/.gitbook/assets/tier1-results.png differ diff --git a/docs/.gitbook/assets/tiers.png b/docs/.gitbook/assets/tiers.png new file mode 100644 index 00000000000..03b26d1ea4f Binary files /dev/null and b/docs/.gitbook/assets/tiers.png differ diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index e1c84e34eec..c68d1829b52 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -2,6 +2,8 @@ * [Introduction](README.md) * [Try OpenMetadata](take-it-for-a-spin.md) +* [Tutorials](tutorials/README.md) + * [Tutorial: Data Discovery with OpenMetadata](tutorials/tutorial-data-discovery-with-openmetadata.md) * [Roadmap](roadmap.md) ## OpenMetadata diff --git a/docs/install/run-openmetadata.md b/docs/install/run-openmetadata.md index 1234fd682a1..ec6d95fad23 100644 --- a/docs/install/run-openmetadata.md +++ b/docs/install/run-openmetadata.md @@ -14,7 +14,7 @@ description: >- **Prerequisites** * Docker >= 20.10.x -* Minimum allocated memory to Docker >= 4GB (Preferences -> Advanced -> Resources) +* Minimum allocated memory to Docker >= 4GB (Preferences -> Resources -> Advanced) {% endhint %} ```bash diff --git a/docs/tutorials/README.md b/docs/tutorials/README.md new file mode 100644 index 00000000000..713ae0815ea --- /dev/null +++ b/docs/tutorials/README.md @@ -0,0 +1,11 @@ +--- +description: >- + These tutorials provide an overview of key features and functionality in Open + Metadata. +--- + +# Tutorials + +{% content-ref url="tutorial-data-discovery-with-openmetadata.md" %} +[tutorial-data-discovery-with-openmetadata.md](tutorial-data-discovery-with-openmetadata.md) +{% endcontent-ref %} diff --git a/docs/tutorials/tutorial-data-discovery-with-openmetadata.md b/docs/tutorials/tutorial-data-discovery-with-openmetadata.md new file mode 100644 index 00000000000..056465d36ad --- /dev/null +++ b/docs/tutorials/tutorial-data-discovery-with-openmetadata.md @@ -0,0 +1,114 @@ +# Tutorial: Data Discovery with OpenMetadata + +In this tutorial, we will explore key features of the OpenMetadata standard and Discovery and Collaboration User Interface. Specifically, we will demonstrate how to: + +* Find data using keyword search across services, databases, tables, tags, etc. +* Use tags to identify the relative importance of different datasets. +* Use data descriptions to distinguish the right data to use for your use case from among many possibilities. + +For this tutorial, we will assume the role of data analysts who have been asked to analyze product sales by region. We will use the OpenMetadata sandbox. The sandbox is an environment in which you can explore OpenMetadata in the context of data assets and the metadata with which a community of users has annotated these resources. + +![](../.gitbook/assets/sandbox.png) + +**1. Log in to the OpenMetadata sandbox using a Google account** + +![](../.gitbook/assets/log-in-with-google.png) + +#### 2. **Add yourself as a user and add yourself to several teams.** + +This is only necessary if you have previously logged in to OpenMetadata. + +![](../.gitbook/assets/login-select-teams.png) + +Once logged in, your view of the sandbox should look something like the figure below. + +![](../.gitbook/assets/my-data.png) + +#### 3. Search for "sales" + +In the search box, enter the search term, sales. OpenMetadata will perform the search across all assets, regardless of type, and retrieve those that match by name or based on the text of metadata associated with that asset. + +Note that as we type the search term sales, OpenMetadata auto-suggests a number of matching assets categorized by type in a dropdown just below the search box. In this case, there are assets of type Table, Topic, and Dashboard displayed. See the figure below for an example. OpenMetadata search also looks for pipelines, column names, tags, and other assets matching your query. Keyword search is, therefore, a powerful tool for locating relevant assets. + +![](../.gitbook/assets/sales-search-v2.png) + +**4. Explore the search results: Tables, Dashboards, Pipelines, etc.** + +Having issued our search for sales, we see results similar to those depicted below. This query matches 12 tables across the BigQuery and Redshift services. + +![](../.gitbook/assets/search-results-v2.png) + +In addition, we’ve identified four dashboards... + +![](../.gitbook/assets/dashboards.png) + +...and an ETL pipeline for sales data. + +![](../.gitbook/assets/pipeline-description.png) + +**5. Take note of descriptions and tags** + +As we look through all of this, it’s important to note the descriptions for these assets. For example, the _fact\_order\_and\_sales\_etl_ pipeline identifies the _fact\_sale_ table as a critical reporting table. + +We also see tags that other users have applied to help identify data types of particular interest contained in each asset. + +![](../.gitbook/assets/etl-description.png) + +Finally, we see that some of the assets are identified with a tag specifying tiers ranging from Tier1 to Tier5. Tiers are a means of identifying the relative importance of assets. + +#### 6. View in-product documentation for Tiers + +To learn more about Tiers and other tags, we can visit _Settings > Tags_. + +![](../.gitbook/assets/settings-tags-menu.png) + +Clicking _Tier_ from the _Tag Categories_ provides us with a description of the Tier tag type as well as a detailed description of each tier. + +Note also that the description for each tier includes a _Usage_ label identifying the number of assets to which that tag has been applied. This number is linked to all assets tagged accordingly. Usage data is maintained for Tier tags and all other tags as well. + +![](../.gitbook/assets/tier-documentation-v2.png) + +#### 7. **Focus on Tier1 (important) assets** + +In general, for analyses that will drive business decisions, we want to ensure that the data we are using is important and already being used to drive other decisions. As we saw in the previous step, Tier1 assets meet this criterion. + +![](../.gitbook/assets/tier-1-documentation-v2.png) + +Based on our consideration of asset descriptions, tags, and tiers, we now have a better sense for how to locate the data we need in order to perform an analysis of sales by region**.** + +Let’s go back to the tables tab in our search results since that’s where we’ll find the source data we need. Looking at the options for filtering search results, we can select Tier1 to limit results to just the most important tables among the assets matching our query. + +![](../.gitbook/assets/tier1-results.png) + +#### **8. Sort by usage frequency** + +In addition to tiers, another determiner of importance is how frequently a table is used. The OpenMetadata search UI enables us to sort results by weekly usage. Let’s go ahead and do that. + +![](../.gitbook/assets/sort-by-weekly-usage.png) + +#### **9. Limit consideration to high usage, Tier1 assets** + +Having sorted the Tier1 assets, we can see that there are probably only two tables that warrant further consideration: _fact\_sale_ and _fact\_order_. Both of these tables are roughly among the top quarter of the most frequently used tables. Based on their names, either could serve our purpose so we’ll need to dig deeper. + +![](<../.gitbook/assets/sorted-by-weekly-usage (1).png>) + +**10. Use descriptions to distinguish between candidate assets** + +At this point, we can see that we’ll need to compare _fact\_sale_ and _fact\_order_ to determine which best suits our needs. Looking at the descriptions for each table we see a couple of statements that help clarify which table we should use. + +First from the _fact\_sale_ description we see a statement that indicates that we should use _fact\_sale_. + +![](../.gitbook/assets/fact-sale-description.jpeg) + +Then from the _fact\_order_ description we see a statement that directs us to use the _fact\_sale_ table when computing financial metrics. + +![](../.gitbook/assets/fact-order-description.jpeg) + +As further evidence, if you’ll recall, the description of the _fact\_order\_and\_sales\_etl_ pipeline that we reviewed in step 5 above also calls out the use of _fact\_sale_ for critical reporting. + +Taken together, the Tier1 designation, the frequency of use, and the direction we’ve gleaned from three asset descriptions provides a high degree of confidence that _fact\_sale_ is the right table for us to use. + +In the next tutorial, we will explore how to assess an asset to learn what we need to know about the individual fields, related tables and other assets, and how to get help with specific questions about the asset. + +**Thanks for following along with this introduction to OpenMetadata! Have questions? Please join the **[**OpenMetadata Slack**](https://slack.open-metadata.org)**. We have an active and engaged community that is ready to help!** +