docs(business glossary) Update business glossary docs (#8287)

This commit is contained in:
Ellie O'Neil 2023-06-26 11:00:09 -07:00 committed by GitHub
parent 1343082535
commit 8880b47ca1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 272 additions and 32 deletions

View File

@ -1,46 +1,251 @@
### Business Glossary File Format
The business glossary source file should be a `.yml` file with the following top-level keys:
The business glossary source file should be a .yml file with the following top-level keys:
**Glossary**: the top level keys of the business glossary file
- **version**: the version of business glossary file config the config conforms to. Currently the only version released is `1`.
- **source**: the source format of the terms. Currently only supports `DataHub`
- **owners**: owners contains two nested fields
- **users**: (optional) a list of user ids
- **groups**: (optional) a list of group ids
- **url**: (optional) external url pointing to where the glossary is defined externally, if applicable.
- **nodes**: (optional) list of child **GlossaryNode** objects
- **terms**: (optional) list of child **GlossaryTerm** objects
Example **Glossary**:
```yaml
version: 1 # the version of business glossary file config the config conforms to. Currently the only version released is `1`.
source: DataHub # the source format of the terms. Currently only supports `DataHub`
owners: # owners contains two nested fields
users: # (optional) a list of user IDs
- njones
groups: # (optional) a list of group IDs
- logistics
url: "https://github.com/datahub-project/datahub/" # (optional) external url pointing to where the glossary is defined externally, if applicable
nodes: # list of child **GlossaryNode** objects. See **GlossaryNode** section below
...
```
**GlossaryNode**: a container of **GlossaryNode** and **GlossaryTerm** objects
- **name**: name of the node
- **description**: description of the node
- **id**: (optional) identifier of the node (normally inferred from the name, see `enable_auto_id` config. Use this if you need a stable identifier)
- **owners**: (optional) owners contains two nested fields
- **users**: (optional) a list of user ids
- **groups**: (optional) a list of group ids
- **terms**: (optional) list of child **GlossaryTerm** objects
- **nodes**: (optional) list of child **GlossaryNode** objects
Example **GlossaryNode**:
```yaml
- name: Shipping # name of the node
description: Provides terms related to the shipping domain # description of the node
owners: # (optional) owners contains 2 nested fields
users: # (optional) a list of user IDs
- njones
groups: # (optional) a list of group IDs
- logistics
nodes: # list of child **GlossaryNode** objects
...
knowledge_links: # (optional) list of **KnowledgeCard** objects
- label: Wiki link for shipping
url: "https://en.wikipedia.org/wiki/Freight_transport"
```
**GlossaryTerm**: a term in your business glossary
- **name**: name of the term
- **description**: description of the term
- **id**: (optional) identifier of the term (normally inferred from the name, see `enable_auto_id` config. Use this if you need a stable identifier)
- **owners**: (optional) owners contains two nested fields
- **users**: (optional) a list of user ids
- **groups**: (optional) a list of group ids
- **term_source**: One of `EXTERNAL` or `INTERNAL`. Whether the term is coming from an external glossary or one defined in your organization.
- **source_ref**: (optional) If external, what is the name of the source the glossary term is coming from?
- **source_url**: (optional) If external, what is the url of the source definition?
- **inherits**: (optional) List of **GlossaryTerm** that this term inherits from
- **contains**: (optional) List of **GlossaryTerm** that this term contains
- **custom_properties**: A map of key/value pairs of arbitrary custom properties
- **domain**: (optional) domain name or domain urn
You can also view an example business glossary file checked in [here](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml)
Example **GlossaryTerm**:
```yaml
- name: FullAddress # name of the term
description: A collection of information to give the location of a building or plot of land. # description of the term
owners: # (optional) owners contains 2 nested fields
users: # (optional) a list of user IDs
- njones
groups: # (optional) a list of group IDs
- logistics
term_source: "EXTERNAL" # one of `EXTERNAL` or `INTERNAL`. Whether the term is coming from an external glossary or one defined in your organization.
source_ref: FIBO # (optional) if external, what is the name of the source the glossary term is coming from?
source_url: "https://www.google.com" # (optional) if external, what is the url of the source definition?
inherits: # (optional) list of **GlossaryTerm** that this term inherits from
- Privacy.PII
contains: # (optional) a list of **GlossaryTerm** that this term contains
- Shipping.ZipCode
- Shipping.CountryCode
- Shipping.StreetAddress
custom_properties: # (optional) a map of key/value pairs of arbitrary custom properties
- is_used_for_compliance_tracking: true
knowledge_links: # (optional) a list of **KnowledgeCard** related to this term. These appear as links on the glossary node's page
- url: "https://en.wikipedia.org/wiki/Address"
label: Wiki link
domain: "urn:li:domain:Logistics" # (optional) domain name or domain urn
```
To see how these all work together, check out this comprehensive example business glossary file below:
<details>
<summary>Example business glossary file</summary>
```yaml
version: 1
source: DataHub
owners:
users:
- mjames
url: "https://github.com/datahub-project/datahub/"
nodes:
- name: Classification
description: A set of terms related to Data Classification
knowledge_links:
- label: Wiki link for classification
url: "https://en.wikipedia.org/wiki/Classification"
terms:
- name: Sensitive
description: Sensitive Data
custom_properties:
is_confidential: false
- name: Confidential
description: Confidential Data
custom_properties:
is_confidential: true
- name: HighlyConfidential
description: Highly Confidential Data
custom_properties:
is_confidential: true
domain: Marketing
- name: PersonalInformation
description: All terms related to personal information
owners:
users:
- mjames
terms:
- name: Email
## An example of using an id to pin a term to a specific guid
## See "how to generate custom IDs for your terms" section below
# id: "urn:li:glossaryTerm:41516e310acbfd9076fffc2c98d2d1a3"
description: An individual's email address
inherits:
- Classification.Confidential
owners:
groups:
- Trust and Safety
- name: Address
description: A physical address
- name: Gender
description: The gender identity of the individual
inherits:
- Classification.Sensitive
- name: Shipping
description: Provides terms related to the shipping domain
owners:
users:
- njones
groups:
- logistics
terms:
- name: FullAddress
description: A collection of information to give the location of a building or plot of land.
owners:
users:
- njones
groups:
- logistics
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://www.google.com"
inherits:
- Privacy.PII
contains:
- Shipping.ZipCode
- Shipping.CountryCode
- Shipping.StreetAddress
related_terms:
- Housing.Kitchen.Cutlery
custom_properties:
- is_used_for_compliance_tracking: true
knowledge_links:
- url: "https://en.wikipedia.org/wiki/Address"
label: Wiki link
domain: "urn:li:domain:Logistics"
knowledge_links:
- label: Wiki link for shipping
url: "https://en.wikipedia.org/wiki/Freight_transport"
- name: ClientsAndAccounts
description: Provides basic concepts such as account, account holder, account provider, relationship manager that are commonly used by financial services providers to describe customers and to determine counterparty identities
owners:
groups:
- finance
terms:
- name: Account
description: Container for records associated with a business arrangement for regular transactions and services
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
inherits:
- Classification.HighlyConfidential
contains:
- ClientsAndAccounts.Balance
- name: Balance
description: Amount of money available or owed
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Balance"
- name: Housing
description: Provides terms related to the housing domain
owners:
users:
- mjames
groups:
- interior
nodes:
- name: Colors
description: "Colors that are used in Housing construction"
terms:
- name: Red
description: "red color"
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
- name: Green
description: "green color"
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
- name: Pink
description: pink color
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
terms:
- name: WindowColor
description: Supported window colors
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
values:
- Housing.Colors.Red
- Housing.Colors.Pink
- name: Kitchen
description: a room or area where food is prepared and cooked.
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
- name: Spoon
description: an implement consisting of a small, shallow oval or round bowl on a long handle, used for eating, stirring, and serving food.
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
related_terms:
- Housing.Kitchen
knowledge_links:
- url: "https://en.wikipedia.org/wiki/Spoon"
label: Wiki link
```
</details>
Source file linked [here](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml).
## Generating custom IDs for your terms
IDs are normally inferred from the glossary term/node's name, see the `enable_auto_id` config. But, if you need a stable
identifier, you can generate a custom ID for your term. It should be unique across the entire Glossary.
Here's an example ID:
`id: "urn:li:glossaryTerm:41516e310acbfd9076fffc2c98d2d1a3"`
A note of caution: once you select a custom ID, it cannot be easily changed.
## Compatibility
Compatible with version 1 of business glossary format.
The source will be evolved as we publish newer versions of this format.
The source will be evolved as we publish newer versions of this format.

View File

@ -45,6 +45,39 @@ nodes:
description: The gender identity of the individual
inherits:
- Classification.Sensitive
- name: Shipping
description: Provides terms related to the shipping domain
owners:
users:
- njones
groups:
- logistics
terms:
- name: FullAddress
description: A collection of information to give the location of a building or plot of land.
owners:
users:
- njones
groups:
- logistics
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://www.google.com"
inherits:
- Privacy.PII
contains:
- Shipping.ZipCode
- Shipping.CountryCode
- Shipping.StreetAddress
custom_properties:
- is_used_for_compliance_tracking: true
knowledge_links:
- url: "https://en.wikipedia.org/wiki/Address"
label: Wiki link
domain: "urn:li:domain:Logistics"
knowledge_links:
- label: Wiki link for shipping
url: "https://en.wikipedia.org/wiki/Freight_transport"
- name: ClientsAndAccounts
description: Provides basic concepts such as account, account holder, account provider, relationship manager that are commonly used by financial services providers to describe customers and to determine counterparty identities
owners:
@ -68,6 +101,8 @@ nodes:
- name: Housing
description: Provides terms related to the housing domain
owners:
users:
- mjames
groups:
- interior
nodes: