mirror of
https://github.com/datahub-project/datahub.git
synced 2025-06-27 05:03:31 +00:00
170 lines
4.4 KiB
Markdown
170 lines
4.4 KiB
Markdown
# DataHub Actions Quickstart
|
|
|
|
## Prerequisites
|
|
|
|
The DataHub Actions CLI commands are an extension of the base `datahub` CLI commands. We recommend
|
|
first installing the `datahub` CLI:
|
|
|
|
```shell
|
|
python3 -m pip install --upgrade pip wheel setuptools
|
|
python3 -m pip install --upgrade acryl-datahub
|
|
datahub --version
|
|
```
|
|
|
|
> Note that the Actions Framework requires a version of `acryl-datahub` >= v0.8.34
|
|
|
|
## Installation
|
|
|
|
To install DataHub Actions, you need to install the `acryl-datahub-actions` package from PyPi
|
|
|
|
```shell
|
|
python3 -m pip install --upgrade pip wheel setuptools
|
|
python3 -m pip install --upgrade acryl-datahub-actions
|
|
|
|
# Verify the installation by checking the version.
|
|
datahub actions version
|
|
```
|
|
|
|
### Hello World
|
|
|
|
DataHub ships with a "Hello World" Action which logs all events it receives to the console.
|
|
To run this action, simply create a new Action configuration file:
|
|
|
|
```yaml
|
|
# hello_world.yaml
|
|
name: "hello_world"
|
|
source:
|
|
type: "kafka"
|
|
config:
|
|
connection:
|
|
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
|
|
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
|
|
action:
|
|
type: "hello_world"
|
|
```
|
|
|
|
and then run it using the `datahub actions` command:
|
|
|
|
```shell
|
|
datahub actions -c hello_world.yaml
|
|
```
|
|
|
|
You should the see the following output if the Action has been started successfully:
|
|
|
|
```shell
|
|
Action Pipeline with name 'hello_world' is now running.
|
|
```
|
|
|
|
Now, navigate to the instance of DataHub that you've connected to and perform an Action such as
|
|
|
|
- Adding / removing a Tag
|
|
- Adding / removing a Glossary Term
|
|
- Adding / removing a Domain
|
|
|
|
If all is well, you should see some events being logged to the console
|
|
|
|
```shell
|
|
Hello world! Received event:
|
|
{
|
|
"event_type": "EntityChangeEvent_v1",
|
|
"event": {
|
|
"entityType": "dataset",
|
|
"entityUrn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
|
|
"category": "TAG",
|
|
"operation": "ADD",
|
|
"modifier": "urn:li:tag:pii",
|
|
"parameters": {},
|
|
"auditStamp": {
|
|
"time": 1651082697703,
|
|
"actor": "urn:li:corpuser:datahub",
|
|
"impersonator": null
|
|
},
|
|
"version": 0,
|
|
"source": null
|
|
},
|
|
"meta": {
|
|
"kafka": {
|
|
"topic": "PlatformEvent_v1",
|
|
"offset": 1262,
|
|
"partition": 0
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
_An example of an event emitted when a 'pii' tag has been added to a Dataset._
|
|
|
|
Woohoo! You've successfully started using the Actions framework. Now, let's see how we can get fancy.
|
|
|
|
#### Filtering events
|
|
|
|
If we know which Event types we'd like to consume, we can optionally add a `filter` configuration, which
|
|
will prevent events that do not match the filter from being forwarded to the action.
|
|
|
|
```yaml
|
|
# hello_world.yaml
|
|
name: "hello_world"
|
|
source:
|
|
type: "kafka"
|
|
config:
|
|
connection:
|
|
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
|
|
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
|
|
filter:
|
|
event_type: "EntityChangeEvent_v1"
|
|
action:
|
|
type: "hello_world"
|
|
```
|
|
|
|
_Filtering for events of type EntityChangeEvent_v1 only_
|
|
|
|
#### Advanced Filtering
|
|
|
|
Beyond simply filtering by event type, we can also filter events by matching against the values of their fields. To do so,
|
|
use the `event` block. Each field provided will be compared against the real event's value. An event that matches
|
|
**all** of the fields will be forwarded to the action.
|
|
|
|
```yaml
|
|
# hello_world.yaml
|
|
name: "hello_world"
|
|
source:
|
|
type: "kafka"
|
|
config:
|
|
connection:
|
|
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
|
|
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
|
|
filter:
|
|
event_type: "EntityChangeEvent_v1"
|
|
event:
|
|
category: "TAG"
|
|
operation: "ADD"
|
|
modifier: "urn:li:tag:pii"
|
|
action:
|
|
type: "hello_world"
|
|
```
|
|
|
|
_This filter only matches events representing "PII" tag additions to an entity._
|
|
|
|
And more, we can achieve "OR" semantics on a particular field by providing an array of values.
|
|
|
|
```yaml
|
|
# hello_world.yaml
|
|
name: "hello_world"
|
|
source:
|
|
type: "kafka"
|
|
config:
|
|
connection:
|
|
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
|
|
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
|
|
filter:
|
|
event_type: "EntityChangeEvent_v1"
|
|
event:
|
|
category: "TAG"
|
|
operation: ["ADD", "REMOVE"]
|
|
modifier: "urn:li:tag:pii"
|
|
action:
|
|
type: "hello_world"
|
|
```
|
|
|
|
_This filter only matches events representing "PII" tag additions to OR removals from an entity. How fancy!_
|