diff --git a/metadata-ingestion/docs/sources/business-glossary/datahub-business-glossary.md b/metadata-ingestion/docs/sources/business-glossary/datahub-business-glossary.md index e9e12e3661..3d2a050949 100644 --- a/metadata-ingestion/docs/sources/business-glossary/datahub-business-glossary.md +++ b/metadata-ingestion/docs/sources/business-glossary/datahub-business-glossary.md @@ -1,46 +1,251 @@ ### Business Glossary File Format -The business glossary source file should be a `.yml` file with the following top-level keys: +The business glossary source file should be a .yml file with the following top-level keys: **Glossary**: the top level keys of the business glossary file -- **version**: the version of business glossary file config the config conforms to. Currently the only version released is `1`. -- **source**: the source format of the terms. Currently only supports `DataHub` -- **owners**: owners contains two nested fields - - **users**: (optional) a list of user ids - - **groups**: (optional) a list of group ids -- **url**: (optional) external url pointing to where the glossary is defined externally, if applicable. -- **nodes**: (optional) list of child **GlossaryNode** objects -- **terms**: (optional) list of child **GlossaryTerm** objects +Example **Glossary**: + +```yaml +version: 1 # the version of business glossary file config the config conforms to. Currently the only version released is `1`. +source: DataHub # the source format of the terms. Currently only supports `DataHub` +owners: # owners contains two nested fields + users: # (optional) a list of user IDs + - njones + groups: # (optional) a list of group IDs + - logistics +url: "https://github.com/datahub-project/datahub/" # (optional) external url pointing to where the glossary is defined externally, if applicable +nodes: # list of child **GlossaryNode** objects. See **GlossaryNode** section below + ... +``` **GlossaryNode**: a container of **GlossaryNode** and **GlossaryTerm** objects -- **name**: name of the node -- **description**: description of the node -- **id**: (optional) identifier of the node (normally inferred from the name, see `enable_auto_id` config. Use this if you need a stable identifier) -- **owners**: (optional) owners contains two nested fields - - **users**: (optional) a list of user ids - - **groups**: (optional) a list of group ids -- **terms**: (optional) list of child **GlossaryTerm** objects -- **nodes**: (optional) list of child **GlossaryNode** objects + +Example **GlossaryNode**: + +```yaml +- name: Shipping # name of the node + description: Provides terms related to the shipping domain # description of the node + owners: # (optional) owners contains 2 nested fields + users: # (optional) a list of user IDs + - njones + groups: # (optional) a list of group IDs + - logistics + nodes: # list of child **GlossaryNode** objects + ... + knowledge_links: # (optional) list of **KnowledgeCard** objects + - label: Wiki link for shipping + url: "https://en.wikipedia.org/wiki/Freight_transport" +``` **GlossaryTerm**: a term in your business glossary -- **name**: name of the term -- **description**: description of the term -- **id**: (optional) identifier of the term (normally inferred from the name, see `enable_auto_id` config. Use this if you need a stable identifier) -- **owners**: (optional) owners contains two nested fields - - **users**: (optional) a list of user ids - - **groups**: (optional) a list of group ids -- **term_source**: One of `EXTERNAL` or `INTERNAL`. Whether the term is coming from an external glossary or one defined in your organization. -- **source_ref**: (optional) If external, what is the name of the source the glossary term is coming from? -- **source_url**: (optional) If external, what is the url of the source definition? -- **inherits**: (optional) List of **GlossaryTerm** that this term inherits from -- **contains**: (optional) List of **GlossaryTerm** that this term contains -- **custom_properties**: A map of key/value pairs of arbitrary custom properties -- **domain**: (optional) domain name or domain urn -You can also view an example business glossary file checked in [here](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml) +Example **GlossaryTerm**: + +```yaml +- name: FullAddress # name of the term + description: A collection of information to give the location of a building or plot of land. # description of the term + owners: # (optional) owners contains 2 nested fields + users: # (optional) a list of user IDs + - njones + groups: # (optional) a list of group IDs + - logistics + term_source: "EXTERNAL" # one of `EXTERNAL` or `INTERNAL`. Whether the term is coming from an external glossary or one defined in your organization. + source_ref: FIBO # (optional) if external, what is the name of the source the glossary term is coming from? + source_url: "https://www.google.com" # (optional) if external, what is the url of the source definition? + inherits: # (optional) list of **GlossaryTerm** that this term inherits from + - Privacy.PII + contains: # (optional) a list of **GlossaryTerm** that this term contains + - Shipping.ZipCode + - Shipping.CountryCode + - Shipping.StreetAddress + custom_properties: # (optional) a map of key/value pairs of arbitrary custom properties + - is_used_for_compliance_tracking: true + knowledge_links: # (optional) a list of **KnowledgeCard** related to this term. These appear as links on the glossary node's page + - url: "https://en.wikipedia.org/wiki/Address" + label: Wiki link + domain: "urn:li:domain:Logistics" # (optional) domain name or domain urn +``` + +To see how these all work together, check out this comprehensive example business glossary file below: + +
+Example business glossary file + +```yaml +version: 1 +source: DataHub +owners: + users: + - mjames +url: "https://github.com/datahub-project/datahub/" +nodes: + - name: Classification + description: A set of terms related to Data Classification + knowledge_links: + - label: Wiki link for classification + url: "https://en.wikipedia.org/wiki/Classification" + terms: + - name: Sensitive + description: Sensitive Data + custom_properties: + is_confidential: false + - name: Confidential + description: Confidential Data + custom_properties: + is_confidential: true + - name: HighlyConfidential + description: Highly Confidential Data + custom_properties: + is_confidential: true + domain: Marketing + - name: PersonalInformation + description: All terms related to personal information + owners: + users: + - mjames + terms: + - name: Email + ## An example of using an id to pin a term to a specific guid + ## See "how to generate custom IDs for your terms" section below + # id: "urn:li:glossaryTerm:41516e310acbfd9076fffc2c98d2d1a3" + description: An individual's email address + inherits: + - Classification.Confidential + owners: + groups: + - Trust and Safety + - name: Address + description: A physical address + - name: Gender + description: The gender identity of the individual + inherits: + - Classification.Sensitive + - name: Shipping + description: Provides terms related to the shipping domain + owners: + users: + - njones + groups: + - logistics + terms: + - name: FullAddress + description: A collection of information to give the location of a building or plot of land. + owners: + users: + - njones + groups: + - logistics + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://www.google.com" + inherits: + - Privacy.PII + contains: + - Shipping.ZipCode + - Shipping.CountryCode + - Shipping.StreetAddress + related_terms: + - Housing.Kitchen.Cutlery + custom_properties: + - is_used_for_compliance_tracking: true + knowledge_links: + - url: "https://en.wikipedia.org/wiki/Address" + label: Wiki link + domain: "urn:li:domain:Logistics" + knowledge_links: + - label: Wiki link for shipping + url: "https://en.wikipedia.org/wiki/Freight_transport" + - name: ClientsAndAccounts + description: Provides basic concepts such as account, account holder, account provider, relationship manager that are commonly used by financial services providers to describe customers and to determine counterparty identities + owners: + groups: + - finance + terms: + - name: Account + description: Container for records associated with a business arrangement for regular transactions and services + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account" + inherits: + - Classification.HighlyConfidential + contains: + - ClientsAndAccounts.Balance + - name: Balance + description: Amount of money available or owed + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Balance" + - name: Housing + description: Provides terms related to the housing domain + owners: + users: + - mjames + groups: + - interior + nodes: + - name: Colors + description: "Colors that are used in Housing construction" + terms: + - name: Red + description: "red color" + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account" + + - name: Green + description: "green color" + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account" + + - name: Pink + description: pink color + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account" + terms: + - name: WindowColor + description: Supported window colors + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account" + values: + - Housing.Colors.Red + - Housing.Colors.Pink + + - name: Kitchen + description: a room or area where food is prepared and cooked. + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account" + + - name: Spoon + description: an implement consisting of a small, shallow oval or round bowl on a long handle, used for eating, stirring, and serving food. + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account" + related_terms: + - Housing.Kitchen + knowledge_links: + - url: "https://en.wikipedia.org/wiki/Spoon" + label: Wiki link +``` +
+ +Source file linked [here](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml). + +## Generating custom IDs for your terms + +IDs are normally inferred from the glossary term/node's name, see the `enable_auto_id` config. But, if you need a stable +identifier, you can generate a custom ID for your term. It should be unique across the entire Glossary. + +Here's an example ID: +`id: "urn:li:glossaryTerm:41516e310acbfd9076fffc2c98d2d1a3"` + +A note of caution: once you select a custom ID, it cannot be easily changed. ## Compatibility Compatible with version 1 of business glossary format. -The source will be evolved as we publish newer versions of this format. +The source will be evolved as we publish newer versions of this format. \ No newline at end of file diff --git a/metadata-ingestion/examples/bootstrap_data/business_glossary.yml b/metadata-ingestion/examples/bootstrap_data/business_glossary.yml index d35430d6d0..f2c7011032 100644 --- a/metadata-ingestion/examples/bootstrap_data/business_glossary.yml +++ b/metadata-ingestion/examples/bootstrap_data/business_glossary.yml @@ -45,6 +45,39 @@ nodes: description: The gender identity of the individual inherits: - Classification.Sensitive + - name: Shipping + description: Provides terms related to the shipping domain + owners: + users: + - njones + groups: + - logistics + terms: + - name: FullAddress + description: A collection of information to give the location of a building or plot of land. + owners: + users: + - njones + groups: + - logistics + term_source: "EXTERNAL" + source_ref: FIBO + source_url: "https://www.google.com" + inherits: + - Privacy.PII + contains: + - Shipping.ZipCode + - Shipping.CountryCode + - Shipping.StreetAddress + custom_properties: + - is_used_for_compliance_tracking: true + knowledge_links: + - url: "https://en.wikipedia.org/wiki/Address" + label: Wiki link + domain: "urn:li:domain:Logistics" + knowledge_links: + - label: Wiki link for shipping + url: "https://en.wikipedia.org/wiki/Freight_transport" - name: ClientsAndAccounts description: Provides basic concepts such as account, account holder, account provider, relationship manager that are commonly used by financial services providers to describe customers and to determine counterparty identities owners: @@ -68,6 +101,8 @@ nodes: - name: Housing description: Provides terms related to the housing domain owners: + users: + - mjames groups: - interior nodes: