mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-11-04 12:51:23 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			170 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			170 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# DataHub Actions Quickstart
 | 
						|
 | 
						|
## Prerequisites
 | 
						|
 | 
						|
The DataHub Actions CLI commands are an extension of the base `datahub` CLI commands. We recommend
 | 
						|
first installing the `datahub` CLI:
 | 
						|
 | 
						|
```shell
 | 
						|
python3 -m pip install --upgrade pip wheel setuptools
 | 
						|
python3 -m pip install --upgrade acryl-datahub
 | 
						|
datahub --version
 | 
						|
```
 | 
						|
 | 
						|
> Note that the Actions Framework requires a version of `acryl-datahub` >= v0.8.34
 | 
						|
 | 
						|
## Installation
 | 
						|
 | 
						|
To install DataHub Actions, you need to install the `acryl-datahub-actions` package from PyPi
 | 
						|
 | 
						|
```shell
 | 
						|
python3 -m pip install --upgrade pip wheel setuptools
 | 
						|
python3 -m pip install --upgrade acryl-datahub-actions
 | 
						|
 | 
						|
# Verify the installation by checking the version.
 | 
						|
datahub actions version
 | 
						|
```
 | 
						|
 | 
						|
### Hello World
 | 
						|
 | 
						|
DataHub ships with a "Hello World" Action which logs all events it receives to the console.
 | 
						|
To run this action, simply create a new Action configuration file:
 | 
						|
 | 
						|
```yaml
 | 
						|
# hello_world.yaml
 | 
						|
name: "hello_world"
 | 
						|
source:
 | 
						|
  type: "kafka"
 | 
						|
  config:
 | 
						|
    connection:
 | 
						|
      bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
 | 
						|
      schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
 | 
						|
action:
 | 
						|
  type: "hello_world"
 | 
						|
```
 | 
						|
 | 
						|
and then run it using the `datahub actions` command:
 | 
						|
 | 
						|
```shell
 | 
						|
datahub actions -c hello_world.yaml
 | 
						|
```
 | 
						|
 | 
						|
You should the see the following output if the Action has been started successfully:
 | 
						|
 | 
						|
```shell
 | 
						|
Action Pipeline with name 'hello_world' is now running.
 | 
						|
```
 | 
						|
 | 
						|
Now, navigate to the instance of DataHub that you've connected to and perform an Action such as
 | 
						|
 | 
						|
- Adding / removing a Tag
 | 
						|
- Adding / removing a Glossary Term
 | 
						|
- Adding / removing a Domain
 | 
						|
 | 
						|
If all is well, you should see some events being logged to the console
 | 
						|
 | 
						|
```shell
 | 
						|
Hello world! Received event:
 | 
						|
{
 | 
						|
    "event_type": "EntityChangeEvent_v1",
 | 
						|
    "event": {
 | 
						|
        "entityType": "dataset",
 | 
						|
        "entityUrn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
 | 
						|
        "category": "TAG",
 | 
						|
        "operation": "ADD",
 | 
						|
        "modifier": "urn:li:tag:pii",
 | 
						|
        "parameters": {},
 | 
						|
        "auditStamp": {
 | 
						|
            "time": 1651082697703,
 | 
						|
            "actor": "urn:li:corpuser:datahub",
 | 
						|
            "impersonator": null
 | 
						|
        },
 | 
						|
        "version": 0,
 | 
						|
        "source": null
 | 
						|
    },
 | 
						|
    "meta": {
 | 
						|
        "kafka": {
 | 
						|
            "topic": "PlatformEvent_v1",
 | 
						|
            "offset": 1262,
 | 
						|
            "partition": 0
 | 
						|
        }
 | 
						|
    }
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
_An example of an event emitted when a 'pii' tag has been added to a Dataset._
 | 
						|
 | 
						|
Woohoo! You've successfully started using the Actions framework. Now, let's see how we can get fancy.
 | 
						|
 | 
						|
#### Filtering events
 | 
						|
 | 
						|
If we know which Event types we'd like to consume, we can optionally add a `filter` configuration, which
 | 
						|
will prevent events that do not match the filter from being forwarded to the action.
 | 
						|
 | 
						|
```yaml
 | 
						|
# hello_world.yaml
 | 
						|
name: "hello_world"
 | 
						|
source:
 | 
						|
  type: "kafka"
 | 
						|
  config:
 | 
						|
    connection:
 | 
						|
      bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
 | 
						|
      schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
 | 
						|
filter:
 | 
						|
  event_type: "EntityChangeEvent_v1"
 | 
						|
action:
 | 
						|
  type: "hello_world"
 | 
						|
```
 | 
						|
 | 
						|
_Filtering for events of type EntityChangeEvent_v1 only_
 | 
						|
 | 
						|
#### Advanced Filtering
 | 
						|
 | 
						|
Beyond simply filtering by event type, we can also filter events by matching against the values of their fields. To do so,
 | 
						|
use the `event` block. Each field provided will be compared against the real event's value. An event that matches
 | 
						|
**all** of the fields will be forwarded to the action.
 | 
						|
 | 
						|
```yaml
 | 
						|
# hello_world.yaml
 | 
						|
name: "hello_world"
 | 
						|
source:
 | 
						|
  type: "kafka"
 | 
						|
  config:
 | 
						|
    connection:
 | 
						|
      bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
 | 
						|
      schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
 | 
						|
filter:
 | 
						|
  event_type: "EntityChangeEvent_v1"
 | 
						|
  event:
 | 
						|
    category: "TAG"
 | 
						|
    operation: "ADD"
 | 
						|
    modifier: "urn:li:tag:pii"
 | 
						|
action:
 | 
						|
  type: "hello_world"
 | 
						|
```
 | 
						|
 | 
						|
_This filter only matches events representing "PII" tag additions to an entity._
 | 
						|
 | 
						|
And more, we can achieve "OR" semantics on a particular field by providing an array of values.
 | 
						|
 | 
						|
```yaml
 | 
						|
# hello_world.yaml
 | 
						|
name: "hello_world"
 | 
						|
source:
 | 
						|
  type: "kafka"
 | 
						|
  config:
 | 
						|
    connection:
 | 
						|
      bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
 | 
						|
      schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
 | 
						|
filter:
 | 
						|
  event_type: "EntityChangeEvent_v1"
 | 
						|
  event:
 | 
						|
    category: "TAG"
 | 
						|
    operation: ["ADD", "REMOVE"]
 | 
						|
    modifier: "urn:li:tag:pii"
 | 
						|
action:
 | 
						|
  type: "hello_world"
 | 
						|
```
 | 
						|
 | 
						|
_This filter only matches events representing "PII" tag additions to OR removals from an entity. How fancy!_
 |