Currently, this project only supports aspects defined in PDL to existing or newly defined entities. You cannot add new aspects to the metadata model directly through yaml configuration yet.
Before proceeding further, make sure you understand the DataHub Metadata Model concepts defined [here](/docs/modeling/metadata-model.md) and extending the model defined [here](/docs/modeling/extending-the-metadata-model.md).
Follow the regular process in creating a new aspect by adding it to the [`src/main/pegasus`](./src/main/pegasus) folder. e.g. This repository has an Aspect called `customDataQualityRules` hosted in the [`DataQualityRules.pdl`](./src/main/pegasus/com/mycompany/dq/DataQualityRules.pdl) file that you can follow.
Once you've gone through this exercise, feel free to delete the sample aspects that are stored in this module.
**_Tip_**: PDL requires that the name of the file must match the name of the class that is defined in it and the package path must also match the directory path, so keep that in mind when you create your aspect pdl file.
- id: The name of your registry. This drives naming, artifact generation, so make sure you pick a unique name that will not conflict with other names you might create for other registries.
- entities: A list of entities with aspects attached to them that you are creating additional aspects for as well as any new entities you wish to define. In this example, we are adding the aspect `customDataQualityRules` to the `dataset` entity.
This will install the zip file as a datahub plugin. It is installed at `~/.datahub/plugins/models/` and if you list the directory you should see the following path if you are following the customDataQualityRules implementation example: `~/.datahub/plugins/models/mycompany-dq-model/0.0.0-dev/`
This will unpack the artifact and deposit it under `~/.datahub/plugins/models/<registry-name>/<registry-version>/`.
#### Deploying to a remote Kubernetes server
Deploying your customized jar to a remote Kubernetes server requires that you take the output zip
(generated from `../gradlew modelArtifact` under `build/dist`) and place the unzipped contents in the volumes mount for the GMS pod on the remote server.
First you will need to push the files into a configmap using kubectl:
The `scripts/insert_custom_aspect.py` script shows you how to accomplish the same using the Python SDK. Note that we are just using a raw dictionary here to represent the `dq_rule` aspect and not a strongly-typed class.
e.g. `datahub delete by-registry --registry-id=mycompany-dq-model:0.0.1 --hard` will delete all data written using this registry name and version pair.
As you evolve the metadata model, you can publish new versions of the repository and deploy it into DataHub as well using the same steps outlined above. DataHub will check whether your new models are backwards compatible with the previous versioned model and decline loading models that are backwards incompatible.
Custom aspects might require that instances of those aspects adhere to specific conditions or rules. These conditions could vary wildly depending on the use case however they could be as simple
as a null or range check for one or more fields within the custom aspect. Additionally, a lookup can be done on other aspects in order to validate the current aspect using the `AspectRetriever`.
There are two integration points for validation. The first integration point is `on request` via the `validateProposedAspect` method where the aspect is validated independent of the previous value. This validation is performed
outside of a database transaction and can perform more intensive checks without introducing added latency within a transaction. Note that added latency from the validation check is still introduced into the request itself.
The second integration point for validation occurs within the database transaction using the `validatePreCommitAspect` and has access to the new aspect as well as the old aspect. See the included
example in [`CustomDataQualityRulesValidator.java`](src/main/java/com/linkedin/metadata/aspect/plugins/validation/CustomDataQualityRulesValidator.java).
Shown below is the interface to be implemented for a custom validator.
```java
public class CustomDataQualityRulesValidator extends AspectPayloadValidator {
In this example, we want to make sure that the field type is always lowercase regardless of the string being provided
by ingestion. The full example can be found in [`CustomDataQualityMutator.java`](src/main/java/com/linkedin/metadata/aspect/plugins/hooks/CustomDataQualityRulesMutator.java).
```java
public class CustomDataQualityRulesMutator extends MutationHook {
**Warning: This hook is for advanced users only. It is possible to corrupt data and render your system inoperable.**
MCP Side Effects allow for the creation of new aspects based on an input aspect.
Notes:
* MCPs will write aspects to the primary data store (SQL for example) as well as the search indices.
* Side effects in general must include a dependency on the `metadata-io` module since it deals with lower level storage primitives.
The full example can be found in [`CustomDataQualityRulesMCPSideEffect.java`](src/main/java/com/linkedin/metadata/aspect/plugins/hooks/CustomDataQualityRulesMCPSideEffect.java).
```java
public class CustomDataQualityRulesMCPSideEffect extends MCPSideEffect {
**Warning: This hook is for advanced users only. It is possible to corrupt data and render your system inoperable.**
MCL Side Effects allow for the creation of new aspects based on an input aspect. In this example, we are generating a timeseries aspect to represent an event. When a DataQualityRule is created
or modified we'll record the actor, event type, and timestamp in a timeseries aspect index.
Notes:
* MCLs are only persisted to the search indices which allows for adding to the search documents only.
* Dependency on the `metadata-io` module since it deals with lower level storage primitives.
The full example can be found in [`CustomDataQualityRulesMCLSideEffect.java`](src/main/java/com/linkedin/metadata/aspect/plugins/hooks/CustomDataQualityRulesMCLSideEffect.java).
```java
public class CustomDataQualityRulesMCLSideEffect extends MCLSideEffect {