feat(ingestion/business-glossary): Automatically generate predictable glossary term and node URNs when incompatible URL characters are specified in term and node names. (#12673)

This commit is contained in:
Jonny Dixon 2025-03-06 06:30:10 -08:00 committed by GitHub
parent 4714f46f11
commit a700448bad
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
13 changed files with 963 additions and 293 deletions

View File

@ -20,6 +20,8 @@ This file documents any backwards-incompatible changes in DataHub and assists pe
### Breaking Changes
- #12673: Business Glossary ID generation has been modified to handle special characters and URL cleaning. When `enable_auto_id` is false (default), IDs are now generated by cleaning the name (converting spaces to hyphens, removing special characters except periods which are used as path separators) while preserving case. This may result in different IDs being generated for terms with special characters.
- #12580: The OpenAPI source handled nesting incorrectly. 12580 fixes it to create proper nested field paths, however, this will re-write the incorrect schemas of existing OpenAPI runs.
- #12408: The `platform` field in the DataPlatformInstance GraphQL type is removed. Clients need to retrieve the platform via the optional `dataPlatformInstance` field.

View File

@ -24,7 +24,8 @@ nodes: # list of child **Glossa
Example **GlossaryNode**:
```yaml
- name: Shipping # name of the node
- name: "Shipping" # name of the node
id: "Shipping-Logistics" # (optional) custom identifier for the node
description: Provides terms related to the shipping domain # description of the node
owners: # (optional) owners contains 2 nested fields
users: # (optional) a list of user IDs
@ -43,7 +44,8 @@ Example **GlossaryNode**:
Example **GlossaryTerm**:
```yaml
- name: FullAddress # name of the term
- name: "Full Address" # name of the term
id: "Full-Address-Details" # (optional) custom identifier for the term
description: A collection of information to give the location of a building or plot of land. # description of the term
owners: # (optional) owners contains 2 nested fields
users: # (optional) a list of user IDs
@ -67,10 +69,86 @@ Example **GlossaryTerm**:
domain: "urn:li:domain:Logistics" # (optional) domain name or domain urn
```
To see how these all work together, check out this comprehensive example business glossary file below:
## ID Management and URL Generation
<details>
<summary>Example business glossary file</summary>
The business glossary provides two primary ways to manage term and node identifiers:
1. **Custom IDs**: You can explicitly specify an ID for any term or node using the `id` field. This is recommended for terms that need stable, predictable identifiers:
```yaml
terms:
- name: "Response Time"
id: "support-response-time" # Explicit ID
description: "Target time to respond to customer inquiries"
```
2. **Automatic ID Generation**: When no ID is specified, the system will generate one based on the `enable_auto_id` setting:
- With `enable_auto_id: false` (default):
- Node and term names are converted to URL-friendly format
- Spaces within names are replaced with hyphens
- Special characters are removed (except hyphens)
- Case is preserved
- Multiple hyphens are collapsed to single ones
- Path components (node/term hierarchy) are joined with periods
- Example: Node "Customer Support" with term "Response Time" → "Customer-Support.Response-Time"
- With `enable_auto_id: true`:
- Generates GUID-based IDs
- Recommended for guaranteed uniqueness
- Required for terms with non-ASCII characters
Here's how path-based ID generation works:
```yaml
nodes:
- name: "Customer Support" # Node ID: Customer-Support
terms:
- name: "Response Time" # Term ID: Customer-Support.Response-Time
description: "Response SLA"
- name: "First Reply" # Term ID: Customer-Support.First-Reply
description: "Initial response"
- name: "Product Feedback" # Node ID: Product-Feedback
terms:
- name: "Response Time" # Term ID: Product-Feedback.Response-Time
description: "Feedback response"
```
**Important Notes**:
- Periods (.) are used exclusively as path separators between nodes and terms
- Periods in term or node names themselves will be removed
- Each component of the path (node names, term names) is cleaned independently:
- Spaces to hyphens
- Special characters removed
- Case preserved
- The cleaned components are then joined with periods to form the full path
- Non-ASCII characters in any component trigger automatic GUID generation
- Once an ID is created (either manually or automatically), it cannot be easily changed
- All references to a term (in `inherits`, `contains`, etc.) must use its correct ID
- Moving terms in the hierarchy does NOT update their IDs:
- The ID retains its original path components even after moving
- This can lead to IDs that don't match the current location
- Consider using `enable_auto_id: true` if you plan to reorganize your glossary
- For terms that other terms will reference, consider using explicit IDs or enable auto_id
Example of how different names are handled:
```yaml
nodes:
- name: "Data Services" # Node ID: Data-Services
terms:
# Basic term name
- name: "Response Time" # Term ID: Data-Services.Response-Time
description: "SLA metrics"
# Term name with special characters
- name: "API @ Response" # Term ID: Data-Services.API-Response
description: "API metrics"
# Term with non-ASCII (triggers GUID)
- name: "パフォーマンス" # Term ID will be a 32-character GUID
description: "Performance"
```
To see how these all work together, check out this comprehensive example business glossary file below:
```yaml
version: "1"
@ -80,172 +158,108 @@ owners:
- mjames
url: "https://github.com/datahub-project/datahub/"
nodes:
- name: Classification
- name: "Data Classification"
id: "Data-Classification" # Custom ID for stable references
description: A set of terms related to Data Classification
knowledge_links:
- label: Wiki link for classification
url: "https://en.wikipedia.org/wiki/Classification"
terms:
- name: Sensitive
- name: "Sensitive Data" # Will generate: Data-Classification.Sensitive-Data
description: Sensitive Data
custom_properties:
is_confidential: "false"
- name: Confidential
- name: "Confidential Information" # Will generate: Data-Classification.Confidential-Information
description: Confidential Data
custom_properties:
is_confidential: "true"
- name: HighlyConfidential
- name: "Highly Confidential" # Will generate: Data-Classification.Highly-Confidential
description: Highly Confidential Data
custom_properties:
is_confidential: "true"
domain: Marketing
- name: PersonalInformation
- name: "Personal Information"
description: All terms related to personal information
owners:
users:
- mjames
terms:
- name: Email
## An example of using an id to pin a term to a specific guid
## See "how to generate custom IDs for your terms" section below
# id: "urn:li:glossaryTerm:41516e310acbfd9076fffc2c98d2d1a3"
- name: "Email" # Will generate: Personal-Information.Email
description: An individual's email address
inherits:
- Classification.Confidential
- Data-Classification.Confidential # References parent node path
owners:
groups:
- Trust and Safety
- name: Address
- name: "Address" # Will generate: Personal-Information.Address
description: A physical address
- name: Gender
- name: "Gender" # Will generate: Personal-Information.Gender
description: The gender identity of the individual
inherits:
- Classification.Sensitive
- name: Shipping
description: Provides terms related to the shipping domain
owners:
users:
- njones
groups:
- logistics
terms:
- name: FullAddress
description: A collection of information to give the location of a building or plot of land.
owners:
users:
- njones
groups:
- logistics
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://www.google.com"
inherits:
- Privacy.PII
contains:
- Shipping.ZipCode
- Shipping.CountryCode
- Shipping.StreetAddress
related_terms:
- Housing.Kitchen.Cutlery
custom_properties:
- is_used_for_compliance_tracking: "true"
knowledge_links:
- url: "https://en.wikipedia.org/wiki/Address"
label: Wiki link
domain: "urn:li:domain:Logistics"
knowledge_links:
- label: Wiki link for shipping
url: "https://en.wikipedia.org/wiki/Freight_transport"
- name: ClientsAndAccounts
- Data-Classification.Sensitive # References parent node path
- name: "Clients And Accounts"
description: Provides basic concepts such as account, account holder, account provider, relationship manager that are commonly used by financial services providers to describe customers and to determine counterparty identities
owners:
groups:
- finance
type: DATAOWNER
terms:
- name: Account
- name: "Account" # Will generate: Clients-And-Accounts.Account
description: Container for records associated with a business arrangement for regular transactions and services
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
inherits:
- Classification.HighlyConfidential
- Data-Classification.Highly-Confidential # References parent node path
contains:
- ClientsAndAccounts.Balance
- name: Balance
- Clients-And-Accounts.Balance # References term in same node
- name: "Balance" # Will generate: Clients-And-Accounts.Balance
description: Amount of money available or owed
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Balance"
- name: Housing
description: Provides terms related to the housing domain
owners:
users:
- mjames
groups:
- interior
nodes:
- name: Colors
description: "Colors that are used in Housing construction"
terms:
- name: Red
description: "red color"
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
- name: Green
description: "green color"
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
- name: Pink
description: pink color
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
- name: "KPIs"
description: Common Business KPIs
terms:
- name: WindowColor
description: Supported window colors
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
values:
- Housing.Colors.Red
- Housing.Colors.Pink
- name: Kitchen
description: a room or area where food is prepared and cooked.
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
- name: Spoon
description: an implement consisting of a small, shallow oval or round bowl on a long handle, used for eating, stirring, and serving food.
term_source: "EXTERNAL"
source_ref: FIBO
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
related_terms:
- Housing.Kitchen
knowledge_links:
- url: "https://en.wikipedia.org/wiki/Spoon"
label: Wiki link
- name: "CSAT %" # Will generate: KPIs.CSAT
description: Customer Satisfaction Score
```
</details>
Source file linked [here](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml).
## Custom ID Specification
## Generating custom IDs for your terms
Custom IDs can be specified in two ways, both of which are fully supported and acceptable:
IDs are normally inferred from the glossary term/node's name, see the `enable_auto_id` config. But, if you need a stable
identifier, you can generate a custom ID for your term. It should be unique across the entire Glossary.
1. Just the ID portion (simpler approach):
```yaml
terms:
- name: "Email"
id: "company-email" # Will become urn:li:glossaryTerm:company-email
description: "Company email address"
```
Here's an example ID:
`id: "urn:li:glossaryTerm:41516e310acbfd9076fffc2c98d2d1a3"`
2. Full URN format:
```yaml
terms:
- name: "Email"
id: "urn:li:glossaryTerm:company-email"
description: "Company email address"
```
A note of caution: once you select a custom ID, it cannot be easily changed.
Both methods are valid and will work correctly. The system will automatically handle the URN prefix if you specify just the ID portion.
The same applies for nodes:
```yaml
nodes:
- name: "Communications"
id: "internal-comms" # Will become urn:li:glossaryNode:internal-comms
description: "Internal communication methods"
```
Note: Once you select a custom ID, it cannot be easily changed.
## Compatibility
Compatible with version 1 of business glossary format.
The source will be evolved as we publish newer versions of this format.
Compatible with version 1 of business glossary format. The source will be evolved as newer versions of this format are published.

View File

@ -1,5 +1,6 @@
import logging
import pathlib
import re
import time
from dataclasses import dataclass, field
from typing import Any, Dict, Iterable, List, Optional, TypeVar, Union
@ -118,17 +119,58 @@ class BusinessGlossaryConfig(DefaultConfig):
return v
def clean_url(text: str) -> str:
"""
Clean text for use in URLs by:
1. Replacing spaces with hyphens
2. Removing special characters (preserving hyphens and periods)
3. Collapsing multiple hyphens and periods into single ones
"""
# Replace spaces with hyphens
text = text.replace(" ", "-")
# Remove special characters except hyphens and periods
text = re.sub(r"[^a-zA-Z0-9\-.]", "", text)
# Collapse multiple hyphens into one
text = re.sub(r"-+", "-", text)
# Collapse multiple periods into one
text = re.sub(r"\.+", ".", text)
# Remove leading/trailing hyphens and periods
text = text.strip("-.")
return text
def create_id(path: List[str], default_id: Optional[str], enable_auto_id: bool) -> str:
"""
Create an ID for a glossary node or term.
Args:
path: List of path components leading to this node/term
default_id: Optional manually specified ID
enable_auto_id: Whether to generate GUIDs
"""
if default_id is not None:
return default_id # No need to create id from path as default_id is provided
return default_id # Use explicitly provided ID
id_: str = ".".join(path)
if UrnEncoder.contains_extended_reserved_char(id_):
enable_auto_id = True
# Check for non-ASCII characters before cleaning
if any(ord(c) > 127 for c in id_):
return datahub_guid({"path": id_})
if enable_auto_id:
# Generate GUID for auto_id mode
id_ = datahub_guid({"path": id_})
else:
# Clean the URL for better readability when not using auto_id
id_ = clean_url(id_)
# Force auto_id if the cleaned URL still contains problematic characters
if UrnEncoder.contains_extended_reserved_char(id_):
logger.warning(
f"ID '{id_}' contains problematic characters after URL cleaning. Falling back to GUID generation for stability."
)
id_ = datahub_guid({"path": id_})
return id_

View File

@ -2,7 +2,7 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:Custom URN Types",
"urn": "urn:li:glossaryNode:Custom-URN-Types",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
@ -42,21 +42,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-dlsmlo",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ugsgt3",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Custom URN Types.Mixed URN Types",
"urn": "urn:li:glossaryTerm:Custom-URN-Types.Mixed-URN-Types",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Mixed URN Types",
"definition": "Term with custom URN types",
"parentNode": "urn:li:glossaryNode:Custom URN Types",
"parentNode": "urn:li:glossaryNode:Custom-URN-Types",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -88,21 +88,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-dlsmlo",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ugsgt3",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Custom URN Types.Mixed Standard and URN",
"urn": "urn:li:glossaryTerm:Custom-URN-Types.Mixed-Standard-and-URN",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Mixed Standard and URN",
"definition": "Term with both standard and URN types",
"parentNode": "urn:li:glossaryNode:Custom URN Types",
"parentNode": "urn:li:glossaryNode:Custom-URN-Types",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -133,13 +133,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-dlsmlo",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ugsgt3",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:Custom URN Types",
"entityUrn": "urn:li:glossaryNode:Custom-URN-Types",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -149,13 +149,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-dlsmlo",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ugsgt3",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Custom URN Types.Mixed Standard and URN",
"entityUrn": "urn:li:glossaryTerm:Custom-URN-Types.Mixed-Standard-and-URN",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -165,13 +165,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-dlsmlo",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ugsgt3",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Custom URN Types.Mixed URN Types",
"entityUrn": "urn:li:glossaryTerm:Custom-URN-Types.Mixed-URN-Types",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -181,7 +181,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-dlsmlo",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ugsgt3",
"lastRunId": "no-run-id-provided"
}
}

View File

@ -21,6 +21,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -32,7 +33,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -58,7 +59,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -88,6 +89,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -99,7 +101,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -125,7 +127,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -155,6 +157,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -166,13 +169,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Classification.Highly Confidential",
"entityUrn": "urn:li:glossaryTerm:Classification.Highly-Confidential",
"changeType": "UPSERT",
"aspectName": "domains",
"aspect": {
@ -184,14 +187,14 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Classification.Highly Confidential",
"urn": "urn:li:glossaryTerm:Classification.Highly-Confidential",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
@ -214,6 +217,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -225,14 +229,14 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:Personal Information",
"urn": "urn:li:glossaryNode:Personal-Information",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
@ -249,6 +253,7 @@
"type": "DATAOWNER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -260,21 +265,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Personal Information.Email",
"urn": "urn:li:glossaryTerm:Personal-Information.Email",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Email",
"definition": "An individual's email address",
"parentNode": "urn:li:glossaryNode:Personal Information",
"parentNode": "urn:li:glossaryNode:Personal-Information",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -295,6 +300,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -306,21 +312,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Personal Information.Address",
"urn": "urn:li:glossaryTerm:Personal-Information.Address",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Address",
"definition": "A physical address",
"parentNode": "urn:li:glossaryNode:Personal Information",
"parentNode": "urn:li:glossaryNode:Personal-Information",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -334,6 +340,7 @@
"type": "DATAOWNER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -345,21 +352,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Personal Information.Gender",
"urn": "urn:li:glossaryTerm:Personal-Information.Gender",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Gender",
"definition": "The gender identity of the individual",
"parentNode": "urn:li:glossaryNode:Personal Information",
"parentNode": "urn:li:glossaryNode:Personal-Information",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -380,6 +387,7 @@
"type": "DATAOWNER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -391,14 +399,14 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:Clients And Accounts",
"urn": "urn:li:glossaryNode:Clients-And-Accounts",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
@ -416,6 +424,7 @@
"typeUrn": "urn:li:ownershipType:my_cutom_type"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -427,21 +436,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Clients And Accounts.Account",
"urn": "urn:li:glossaryTerm:Clients-And-Accounts.Account",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Account",
"definition": "Container for records associated with a business arrangement for regular transactions and services",
"parentNode": "urn:li:glossaryNode:Clients And Accounts",
"parentNode": "urn:li:glossaryNode:Clients-And-Accounts",
"termSource": "EXTERNAL",
"sourceRef": "FIBO",
"sourceUrl": "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
@ -450,10 +459,10 @@
{
"com.linkedin.pegasus2avro.glossary.GlossaryRelatedTerms": {
"isRelatedTerms": [
"urn:li:glossaryTerm:Classification.Highly Confidential"
"urn:li:glossaryTerm:Classification.Highly-Confidential"
],
"hasRelatedTerms": [
"urn:li:glossaryTerm:Clients And Accounts.Balance"
"urn:li:glossaryTerm:Clients-And-Accounts.Balance"
]
}
},
@ -466,6 +475,7 @@
"typeUrn": "urn:li:ownershipType:my_cutom_type"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -477,21 +487,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Clients And Accounts.Balance",
"urn": "urn:li:glossaryTerm:Clients-And-Accounts.Balance",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Balance",
"definition": "Amount of money available or owed",
"parentNode": "urn:li:glossaryNode:Clients And Accounts",
"parentNode": "urn:li:glossaryNode:Clients-And-Accounts",
"termSource": "EXTERNAL",
"sourceRef": "FIBO",
"sourceUrl": "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Balance"
@ -506,6 +516,7 @@
"typeUrn": "urn:li:ownershipType:my_cutom_type"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -517,7 +528,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -541,6 +552,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -552,14 +564,14 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:4faf1eed790370f65942f2998a7993d6",
"urn": "urn:li:glossaryTerm:KPIs.CSAT",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
@ -580,6 +592,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -591,7 +604,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -607,13 +620,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:Clients And Accounts",
"entityUrn": "urn:li:glossaryNode:Clients-And-Accounts",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -623,7 +636,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -639,13 +652,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:Personal Information",
"entityUrn": "urn:li:glossaryNode:Personal-Information",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -655,23 +668,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:4faf1eed790370f65942f2998a7993d6",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -687,13 +684,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Classification.Highly Confidential",
"entityUrn": "urn:li:glossaryTerm:Classification.Highly-Confidential",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -703,7 +700,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
@ -719,13 +716,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Clients And Accounts.Account",
"entityUrn": "urn:li:glossaryTerm:Clients-And-Accounts.Account",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -735,13 +732,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Clients And Accounts.Balance",
"entityUrn": "urn:li:glossaryTerm:Clients-And-Accounts.Balance",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -751,13 +748,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Personal Information.Address",
"entityUrn": "urn:li:glossaryTerm:KPIs.CSAT",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -767,13 +764,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Personal Information.Email",
"entityUrn": "urn:li:glossaryTerm:Personal-Information.Address",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -783,13 +780,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Personal Information.Gender",
"entityUrn": "urn:li:glossaryTerm:Personal-Information.Email",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -799,7 +796,23 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Personal-Information.Gender",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-h7iopd",
"lastRunId": "no-run-id-provided"
}
}

View File

@ -2,7 +2,7 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:Different Owner Types",
"urn": "urn:li:glossaryNode:Different-Owner-Types",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
@ -47,21 +47,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-2te9j9",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-8vduoq",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Different Owner Types.Mixed Ownership",
"urn": "urn:li:glossaryTerm:Different-Owner-Types.Mixed-Ownership",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Mixed Ownership",
"definition": "Term with different owner types",
"parentNode": "urn:li:glossaryNode:Different Owner Types",
"parentNode": "urn:li:glossaryNode:Different-Owner-Types",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -99,13 +99,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-2te9j9",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-8vduoq",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:Different Owner Types",
"entityUrn": "urn:li:glossaryNode:Different-Owner-Types",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -115,13 +115,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-2te9j9",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-8vduoq",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Different Owner Types.Mixed Ownership",
"entityUrn": "urn:li:glossaryTerm:Different-Owner-Types.Mixed-Ownership",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -131,7 +131,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-2te9j9",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-8vduoq",
"lastRunId": "no-run-id-provided"
}
}

View File

@ -2,7 +2,7 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:Multiple Owners",
"urn": "urn:li:glossaryNode:Multiple-Owners",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
@ -47,21 +47,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-0l66l7",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-iuvo6j",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Multiple Owners.Multiple Dev Owners",
"urn": "urn:li:glossaryTerm:Multiple-Owners.Multiple-Dev-Owners",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Multiple Dev Owners",
"definition": "Term owned by multiple developers",
"parentNode": "urn:li:glossaryNode:Multiple Owners",
"parentNode": "urn:li:glossaryNode:Multiple-Owners",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -103,13 +103,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-0l66l7",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-iuvo6j",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:Multiple Owners",
"entityUrn": "urn:li:glossaryNode:Multiple-Owners",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -119,13 +119,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-0l66l7",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-iuvo6j",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Multiple Owners.Multiple Dev Owners",
"entityUrn": "urn:li:glossaryTerm:Multiple-Owners.Multiple-Dev-Owners",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -135,7 +135,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-0l66l7",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-iuvo6j",
"lastRunId": "no-run-id-provided"
}
}

View File

@ -2,7 +2,7 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:Single Owner Types",
"urn": "urn:li:glossaryNode:Single-Owner-Types",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
@ -31,21 +31,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Single Owner Types.Developer Owned",
"urn": "urn:li:glossaryTerm:Single-Owner-Types.Developer-Owned",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Developer Owned",
"definition": "Term owned by developer",
"parentNode": "urn:li:glossaryNode:Single Owner Types",
"parentNode": "urn:li:glossaryNode:Single-Owner-Types",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -71,21 +71,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Single Owner Types.Data Owner Owned",
"urn": "urn:li:glossaryTerm:Single-Owner-Types.Data-Owner-Owned",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Data Owner Owned",
"definition": "Term owned by data owner",
"parentNode": "urn:li:glossaryNode:Single Owner Types",
"parentNode": "urn:li:glossaryNode:Single-Owner-Types",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -111,21 +111,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Single Owner Types.Producer Owned",
"urn": "urn:li:glossaryTerm:Single-Owner-Types.Producer-Owned",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Producer Owned",
"definition": "Term owned by producer",
"parentNode": "urn:li:glossaryNode:Single Owner Types",
"parentNode": "urn:li:glossaryNode:Single-Owner-Types",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -151,21 +151,21 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Single Owner Types.Stakeholder Owned",
"urn": "urn:li:glossaryTerm:Single-Owner-Types.Stakeholder-Owned",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Stakeholder Owned",
"definition": "Term owned by stakeholder",
"parentNode": "urn:li:glossaryNode:Single Owner Types",
"parentNode": "urn:li:glossaryNode:Single-Owner-Types",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -191,13 +191,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:Single Owner Types",
"entityUrn": "urn:li:glossaryNode:Single-Owner-Types",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -207,13 +207,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Single Owner Types.Data Owner Owned",
"entityUrn": "urn:li:glossaryTerm:Single-Owner-Types.Data-Owner-Owned",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -223,13 +223,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Single Owner Types.Developer Owned",
"entityUrn": "urn:li:glossaryTerm:Single-Owner-Types.Developer-Owned",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -239,13 +239,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Single Owner Types.Producer Owned",
"entityUrn": "urn:li:glossaryTerm:Single-Owner-Types.Producer-Owned",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -255,13 +255,13 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Single Owner Types.Stakeholder Owned",
"entityUrn": "urn:li:glossaryTerm:Single-Owner-Types.Stakeholder-Owned",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -271,7 +271,7 @@
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-ruwyic",
"runId": "datahub-business-glossary-2020_04_14-07_00_00-bx72oe",
"lastRunId": "no-run-id-provided"
}
}

View File

@ -4,7 +4,6 @@ import pytest
from freezegun import freeze_time
from datahub.ingestion.run.pipeline import Pipeline
from datahub.ingestion.source.metadata import business_glossary
from tests.test_helpers import mce_helpers
FROZEN_TIME = "2020-04-14 07:00:00"
@ -200,6 +199,31 @@ def test_custom_ownership_urns(
@freeze_time(FROZEN_TIME)
def test_auto_id_creation_on_reserved_char():
id_: str = business_glossary.create_id(["pii", "secure % password"], None, False)
assert id_ == "24baf9389cc05c162c7148c96314d733"
@pytest.mark.integration
def test_url_cleaning(
mock_datahub_graph_instance,
pytestconfig,
tmp_path,
mock_time,
):
"""Test URL cleaning functionality when auto_id is disabled"""
test_resources_dir = pytestconfig.rootpath / "tests/integration/business-glossary"
output_mces_path: str = f"{tmp_path}/url_cleaning_events.json"
golden_mces_path: str = f"{test_resources_dir}/url_cleaning_events_golden.json"
pipeline = Pipeline.create(
get_default_recipe(
glossary_yml_file_path=f"{test_resources_dir}/url_cleaning_glossary.yml",
event_output_file_path=output_mces_path,
enable_auto_id=False,
)
)
pipeline.ctx.graph = mock_datahub_graph_instance
pipeline.run()
pipeline.raise_from_status()
mce_helpers.check_golden_file(
pytestconfig,
output_path=output_mces_path,
golden_path=golden_mces_path,
)

View File

@ -0,0 +1,446 @@
[
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:URL-Testing",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
"customProperties": {},
"definition": "Testing URL cleaning functionality",
"name": "URL Testing"
}
},
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:mjames",
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
}
}
}
]
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:URL-Testing.Basic-Term-With-Spaces",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Basic Term With Spaces",
"definition": "Testing basic space replacement",
"parentNode": "urn:li:glossaryNode:URL-Testing",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
}
},
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:mjames",
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
}
}
}
]
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:URL-Testing.SpecialCharacters",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Special@#$Characters!",
"definition": "Testing special character removal",
"parentNode": "urn:li:glossaryNode:URL-Testing",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
}
},
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:mjames",
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
}
}
}
]
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:URL-Testing.MixedCase-Term",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "MixedCase Term",
"definition": "Testing case preservation",
"parentNode": "urn:li:glossaryNode:URL-Testing",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
}
},
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:mjames",
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
}
}
}
]
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:URL-Testing.Multiple-Spaces",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Multiple Spaces",
"definition": "Testing multiple space handling",
"parentNode": "urn:li:glossaryNode:URL-Testing",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
}
},
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:mjames",
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
}
}
}
]
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:URL-Testing.Term.With.Special-Chars",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Term.With.Special-Chars",
"definition": "Testing mixed special characters",
"parentNode": "urn:li:glossaryNode:URL-Testing",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
}
},
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:mjames",
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
}
}
}
]
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:URL-Testing.Special-At-Start",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "@#$Special At Start",
"definition": "Testing leading special characters",
"parentNode": "urn:li:glossaryNode:URL-Testing",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
}
},
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:mjames",
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
}
}
}
]
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:URL-Testing.Numbers-123",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Numbers 123",
"definition": "Testing numbers in term names",
"parentNode": "urn:li:glossaryNode:URL-Testing",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
}
},
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:mjames",
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
}
}
}
]
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:URL-Testing",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:URL-Testing.Basic-Term-With-Spaces",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:URL-Testing.MixedCase-Term",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:URL-Testing.Multiple-Spaces",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:URL-Testing.Numbers-123",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:URL-Testing.Special-At-Start",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:URL-Testing.SpecialCharacters",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:URL-Testing.Term.With.Special-Chars",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1586847600000,
"runId": "datahub-business-glossary-2020_04_14-07_00_00-4alqef",
"lastRunId": "no-run-id-provided"
}
}
]

View File

@ -0,0 +1,31 @@
# tests/integration/business-glossary/url_cleaning_glossary.yml
version: "1"
source: DataHub
owners:
users:
- mjames
url: "https://github.com/datahub-project/datahub/"
nodes:
- name: "URL Testing"
description: "Testing URL cleaning functionality"
terms:
- name: "Basic Term With Spaces"
description: "Testing basic space replacement"
- name: "Special@#$Characters!"
description: "Testing special character removal"
- name: "MixedCase Term"
description: "Testing case preservation"
- name: "Multiple Spaces"
description: "Testing multiple space handling"
- name: "Term.With.Special-Chars"
description: "Testing mixed special characters"
- name: "@#$Special At Start"
description: "Testing leading special characters"
- name: "Numbers 123"
description: "Testing numbers in term names"

View File

@ -21,6 +21,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -88,6 +89,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -155,6 +157,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -172,7 +175,7 @@
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Classification.Highly Confidential",
"entityUrn": "urn:li:glossaryTerm:Classification.Highly-Confidential",
"changeType": "UPSERT",
"aspectName": "domains",
"aspect": {
@ -191,7 +194,7 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Classification.Highly Confidential",
"urn": "urn:li:glossaryTerm:Classification.Highly-Confidential",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
@ -214,6 +217,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -232,7 +236,7 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:Personal Information",
"urn": "urn:li:glossaryNode:Personal-Information",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
@ -249,6 +253,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -267,14 +272,14 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Personal Information.Email",
"urn": "urn:li:glossaryTerm:Personal-Information.Email",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Email",
"definition": "An individual's email address",
"parentNode": "urn:li:glossaryNode:Personal Information",
"parentNode": "urn:li:glossaryNode:Personal-Information",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -295,6 +300,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -313,14 +319,14 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Personal Information.Address",
"urn": "urn:li:glossaryTerm:Personal-Information.Address",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Address",
"definition": "A physical address",
"parentNode": "urn:li:glossaryNode:Personal Information",
"parentNode": "urn:li:glossaryNode:Personal-Information",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -334,6 +340,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -352,14 +359,14 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Personal Information.Gender",
"urn": "urn:li:glossaryTerm:Personal-Information.Gender",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Gender",
"definition": "The gender identity of the individual",
"parentNode": "urn:li:glossaryNode:Personal Information",
"parentNode": "urn:li:glossaryNode:Personal-Information",
"termSource": "INTERNAL",
"sourceRef": "DataHub",
"sourceUrl": "https://github.com/datahub-project/datahub/"
@ -380,6 +387,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -398,7 +406,7 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryNodeSnapshot": {
"urn": "urn:li:glossaryNode:Clients And Accounts",
"urn": "urn:li:glossaryNode:Clients-And-Accounts",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryNodeInfo": {
@ -415,6 +423,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -433,14 +442,14 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Clients And Accounts.Account",
"urn": "urn:li:glossaryTerm:Clients-And-Accounts.Account",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Account",
"definition": "Container for records associated with a business arrangement for regular transactions and services",
"parentNode": "urn:li:glossaryNode:Clients And Accounts",
"parentNode": "urn:li:glossaryNode:Clients-And-Accounts",
"termSource": "EXTERNAL",
"sourceRef": "FIBO",
"sourceUrl": "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
@ -449,10 +458,10 @@
{
"com.linkedin.pegasus2avro.glossary.GlossaryRelatedTerms": {
"isRelatedTerms": [
"urn:li:glossaryTerm:Classification.Highly Confidential"
"urn:li:glossaryTerm:Classification.Highly-Confidential"
],
"hasRelatedTerms": [
"urn:li:glossaryTerm:Clients And Accounts.Balance"
"urn:li:glossaryTerm:Clients-And-Accounts.Balance"
]
}
},
@ -464,6 +473,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -482,14 +492,14 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:Clients And Accounts.Balance",
"urn": "urn:li:glossaryTerm:Clients-And-Accounts.Balance",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
"customProperties": {},
"name": "Balance",
"definition": "Amount of money available or owed",
"parentNode": "urn:li:glossaryNode:Clients And Accounts",
"parentNode": "urn:li:glossaryNode:Clients-And-Accounts",
"termSource": "EXTERNAL",
"sourceRef": "FIBO",
"sourceUrl": "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Balance"
@ -503,6 +513,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -538,6 +549,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -556,7 +568,7 @@
{
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.GlossaryTermSnapshot": {
"urn": "urn:li:glossaryTerm:4faf1eed790370f65942f2998a7993d6",
"urn": "urn:li:glossaryTerm:KPIs.CSAT",
"aspects": [
{
"com.linkedin.pegasus2avro.glossary.GlossaryTermInfo": {
@ -577,6 +589,7 @@
"type": "DEVELOPER"
}
],
"ownerTypes": {},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
@ -610,7 +623,7 @@
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:Clients And Accounts",
"entityUrn": "urn:li:glossaryNode:Clients-And-Accounts",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -642,23 +655,7 @@
},
{
"entityType": "glossaryNode",
"entityUrn": "urn:li:glossaryNode:Personal Information",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1629795600000,
"runId": "remote-4",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:4faf1eed790370f65942f2998a7993d6",
"entityUrn": "urn:li:glossaryNode:Personal-Information",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -690,7 +687,7 @@
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Classification.Highly Confidential",
"entityUrn": "urn:li:glossaryTerm:Classification.Highly-Confidential",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -722,7 +719,7 @@
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Clients And Accounts.Account",
"entityUrn": "urn:li:glossaryTerm:Clients-And-Accounts.Account",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -738,7 +735,7 @@
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Clients And Accounts.Balance",
"entityUrn": "urn:li:glossaryTerm:Clients-And-Accounts.Balance",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -754,7 +751,7 @@
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Personal Information.Address",
"entityUrn": "urn:li:glossaryTerm:KPIs.CSAT",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -770,7 +767,7 @@
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Personal Information.Email",
"entityUrn": "urn:li:glossaryTerm:Personal-Information.Address",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
@ -786,7 +783,23 @@
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Personal Information.Gender",
"entityUrn": "urn:li:glossaryTerm:Personal-Information.Email",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {
"json": {
"removed": false
}
},
"systemMetadata": {
"lastObserved": 1629795600000,
"runId": "remote-4",
"lastRunId": "no-run-id-provided"
}
},
{
"entityType": "glossaryTerm",
"entityUrn": "urn:li:glossaryTerm:Personal-Information.Gender",
"changeType": "UPSERT",
"aspectName": "status",
"aspect": {

View File

@ -0,0 +1,85 @@
from datahub.ingestion.source.metadata.business_glossary import clean_url, create_id
def test_clean_url():
"""Test the clean_url function with various input cases"""
test_cases = [
("Basic Term", "Basic-Term"),
("Term With Spaces", "Term-With-Spaces"),
("Special@#$Characters!", "SpecialCharacters"),
("MixedCase Term", "MixedCase-Term"),
("Multiple Spaces", "Multiple-Spaces"),
("Term-With-Hyphens", "Term-With-Hyphens"),
("Term.With.Dots", "Term.With.Dots"), # Preserve periods
("Term_With_Underscores", "TermWithUnderscores"),
("123 Numeric Term", "123-Numeric-Term"),
("@#$Special At Start", "Special-At-Start"),
("-Leading-Trailing-", "Leading-Trailing"),
("Multiple...Periods", "Multiple.Periods"), # Test multiple periods
("Mixed-Hyphens.Periods", "Mixed-Hyphens.Periods"), # Test mixed separators
]
for input_str, expected in test_cases:
result = clean_url(input_str)
assert result == expected, (
f"Expected '{expected}' for input '{input_str}', got '{result}'"
)
def test_clean_url_edge_cases():
"""Test clean_url function with edge cases"""
test_cases = [
("", ""), # Empty string
(" ", ""), # Single space
(" ", ""), # Multiple spaces
("@#$%", ""), # Only special characters
("A", "A"), # Single character
("A B", "A-B"), # Two characters with space
("A.B", "A.B"), # Period separator
("...", ""), # Only periods
(".Leading.Trailing.", "Leading.Trailing"), # Leading/trailing periods
]
for input_str, expected in test_cases:
result = clean_url(input_str)
assert result == expected, (
f"Expected '{expected}' for input '{input_str}', got '{result}'"
)
def test_create_id_url_cleaning():
"""Test create_id function's URL cleaning behavior"""
# Test basic URL cleaning
id_ = create_id(["pii", "secure % password"], None, False)
assert id_ == "pii.secure-password"
# Test with multiple path components
id_ = create_id(["Term One", "Term Two", "Term Three"], None, False)
assert id_ == "Term-One.Term-Two.Term-Three"
# Test with path components containing periods
id_ = create_id(["Term.One", "Term.Two"], None, False)
assert id_ == "Term.One.Term.Two"
def test_create_id_with_special_chars():
"""Test create_id function's handling of special characters"""
# Test with non-ASCII characters (should trigger auto_id)
id_ = create_id(["pii", "secure パスワード"], None, False)
assert len(id_) == 32 # GUID length
assert id_.isalnum() # Should only contain alphanumeric characters
# Test with characters that aren't periods or hyphens
id_ = create_id(["test", "special@#$chars"], None, False)
assert id_ == "test.specialchars"
def test_create_id_with_default():
"""Test create_id function with default_id parameter"""
# Test that default_id is respected
id_ = create_id(["any", "path"], "custom-id", False)
assert id_ == "custom-id"
# Test with URN as default_id
id_ = create_id(["any", "path"], "urn:li:glossaryTerm:custom-id", False)
assert id_ == "urn:li:glossaryTerm:custom-id"