datahub/docs/actions/quickstart.md

170 lines
4.4 KiB
Markdown
Raw Permalink Normal View History

# DataHub Actions Quickstart
## Prerequisites
The DataHub Actions CLI commands are an extension of the base `datahub` CLI commands. We recommend
first installing the `datahub` CLI:
```shell
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub
datahub --version
```
> Note that the Actions Framework requires a version of `acryl-datahub` >= v0.8.34
## Installation
To install DataHub Actions, you need to install the `acryl-datahub-actions` package from PyPi
```shell
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub-actions
# Verify the installation by checking the version.
datahub actions version
```
### Hello World
DataHub ships with a "Hello World" Action which logs all events it receives to the console.
To run this action, simply create a new Action configuration file:
```yaml
# hello_world.yaml
name: "hello_world"
source:
type: "kafka"
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
action:
type: "hello_world"
```
and then run it using the `datahub actions` command:
```shell
datahub actions -c hello_world.yaml
```
You should the see the following output if the Action has been started successfully:
```shell
Action Pipeline with name 'hello_world' is now running.
```
Now, navigate to the instance of DataHub that you've connected to and perform an Action such as
- Adding / removing a Tag
- Adding / removing a Glossary Term
- Adding / removing a Domain
If all is well, you should see some events being logged to the console
```shell
Hello world! Received event:
{
"event_type": "EntityChangeEvent_v1",
"event": {
"entityType": "dataset",
"entityUrn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
"category": "TAG",
"operation": "ADD",
"modifier": "urn:li:tag:pii",
"parameters": {},
"auditStamp": {
"time": 1651082697703,
"actor": "urn:li:corpuser:datahub",
"impersonator": null
},
"version": 0,
"source": null
},
"meta": {
"kafka": {
"topic": "PlatformEvent_v1",
"offset": 1262,
"partition": 0
}
}
}
```
_An example of an event emitted when a 'pii' tag has been added to a Dataset._
Woohoo! You've successfully started using the Actions framework. Now, let's see how we can get fancy.
#### Filtering events
If we know which Event types we'd like to consume, we can optionally add a `filter` configuration, which
will prevent events that do not match the filter from being forwarded to the action.
```yaml
# hello_world.yaml
name: "hello_world"
source:
type: "kafka"
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
filter:
event_type: "EntityChangeEvent_v1"
action:
type: "hello_world"
```
_Filtering for events of type EntityChangeEvent_v1 only_
#### Advanced Filtering
Beyond simply filtering by event type, we can also filter events by matching against the values of their fields. To do so,
use the `event` block. Each field provided will be compared against the real event's value. An event that matches
**all** of the fields will be forwarded to the action.
```yaml
# hello_world.yaml
name: "hello_world"
source:
type: "kafka"
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
filter:
event_type: "EntityChangeEvent_v1"
event:
category: "TAG"
operation: "ADD"
modifier: "urn:li:tag:pii"
action:
type: "hello_world"
```
_This filter only matches events representing "PII" tag additions to an entity._
And more, we can achieve "OR" semantics on a particular field by providing an array of values.
```yaml
# hello_world.yaml
name: "hello_world"
source:
type: "kafka"
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
filter:
event_type: "EntityChangeEvent_v1"
event:
category: "TAG"
operation: ["ADD", "REMOVE"]
modifier: "urn:li:tag:pii"
action:
type: "hello_world"
```
_This filter only matches events representing "PII" tag additions to OR removals from an entity. How fancy!_