16 KiB
Plugins Guide
Plugins are way to enhance the basic DataHub functionality in a custom manner.
Currently, DataHub formally supports 2 types of plugins:
Authentication
Note: This is in BETA version
It is recommend that you do not do this unless you really know what you are doing
Custom authentication plugin makes it possible to authenticate DataHub users against any Identity Management System. Choose your Identity Management System and write custom authentication plugin as per detail mentioned in this section.
Currently, custom authenticators cannot be used to authenticate users of DataHub's web UI. This is because the DataHub web app expects the presence of 2 special cookies PLAY_SESSION and actor which are explicitly set by the server when a login action is performed. Instead, custom authenticators are useful for authenticating API requests to DataHub's backend (GMS), and can stand in addition to the default Authentication performed by DataHub, which is based on DataHub-minted access tokens.
The sample authenticator implementation can be found at Authenticator Sample
Implementing an Authentication Plugin
-
Add datahub-auth-api as compileOnly dependency: Maven coordinates of datahub-auth-api can be found at Maven
Example of gradle dependency is given below.
dependencies { def auth_api = 'io.acryl:datahub-auth-api:0.9.3-3rc3' compileOnly "${auth_api}" testImplementation "${auth_api}" } -
Implement the Authenticator interface: Refer Authenticator Sample
Sample class which implements the Authenticator interface
public class GoogleAuthenticator implements Authenticator { @Override public void init(@Nonnull Map<String, Object> authenticatorConfig, @Nullable AuthenticatorContext context) { // Plugin initialization code will go here // DataHub will call this method on boot time } @Nullable @Override public Authentication authenticate(@Nonnull AuthenticationRequest authenticationRequest) throws AuthenticationException { // DataHub will call this method whenever authentication decisions are need to be taken // Authenticate the request and return Authentication } } -
Use
getResourceAsStreamto read files: If your plugin read any configuration file like properties or YAML or JSON or xml then usethis.getClass().getClassLoader().getResourceAsStream("<file-name>")to read that file from DataHub GMS plugin's class-path. For DataHub GMS resource look-up behavior please refer Plugin Installation section. Sample code ofgetResourceAsStreamis available in sample Authenticator plugin TestAuthenticator.java. -
Bundle your Jar: Use
com.gradleup.shadowgradle plugin to create an uber jar.To see an example of building an uber jar, check out the
build.gradlefile for the apache-ranger-plugin file of Apache Ranger Plugin for reference.Exclude signature files as shown in below
shadowJartask.apply plugin: 'com.gradleup.shadow'; shadowJar { // Exclude com.datahub.plugins package and files related to jar signature exclude "META-INF/*.RSA", "META-INF/*.SF","META-INF/*.DSA" } -
Refer section Plugin Installation for plugin installation in DataHub environment
Enable GMS Authentication
By default, authentication is disabled in DataHub GMS.
Follow below steps to enable GMS authentication
-
Download docker-compose.quickstart.yml: Download docker compose file docker-compose.quickstart.yml
-
Set environment variable: Set
METADATA_SERVICE_AUTH_ENABLEDenvironment variable totrue -
Redeploy DataHub GMS: Below is quickstart command to redeploy DataHub GMS
datahub docker quickstart -f docker-compose.quickstart.yml
Authorization
Note: This is in BETA version
It is recommend that you do not do this unless you really know what you are doing
Custom authorization plugin makes it possible to authorize DataHub users against any Access Management System. Choose your Access Management System and write custom authorization plugin as per detail mentioned in this section.
The sample authorizer implementation can be found at Authorizer Sample
Implementing an Authorization Plugin
-
Add datahub-auth-api as compileOnly dependency: Maven coordinates of datahub-auth-api can be found at Maven
Example of gradle dependency is given below.
dependencies { def auth_api = 'io.acryl:datahub-auth-api:0.9.3-3rc3' compileOnly "${auth_api}" testImplementation "${auth_api}" } -
Implement the Authorizer interface: Authorizer Sample
Sample class which implements the Authorization interface
public class ApacheRangerAuthorizer implements Authorizer { @Override public void init(@Nonnull Map<String, Object> authorizerConfig, @Nonnull AuthorizerContext ctx) { // Plugin initialization code will go here // DataHub will call this method on boot time } @Override public AuthorizationResult authorize(@Nonnull AuthorizationRequest request) { // DataHub will call this method whenever authorization decisions are need be taken // Authorize the request and return AuthorizationResult } @Override public AuthorizedActors authorizedActors(String privilege, Optional<ResourceSpec> resourceSpec) { // Need to add doc } } -
Use
getResourceAsStreamto read files: If your plugin read any configuration file like properties or YAML or JSON or xml then usethis.getClass().getClassLoader().getResourceAsStream("<file-name>")to read that file from DataHub GMS plugin's class-path. For DataHub GMS resource look-up behavior please refer Plugin Installation section. Sample code ofgetResourceAsStreamis available in sample Authenticator plugin TestAuthenticator.java. -
Bundle your Jar: Use
com.gradleup.shadowgradle plugin to create an uber jar.To see an example of building an uber jar, check out the
build.gradlefile for the apache-ranger-plugin file of Apache Ranger Plugin for reference.Exclude signature files as shown in below
shadowJartask.apply plugin: 'com.gradleup.shadow'; shadowJar { // Exclude com.datahub.plugins package and files related to jar signature exclude "META-INF/*.RSA", "META-INF/*.SF","META-INF/*.DSA" } -
Install the Plugin: Refer to the section (Plugin Installation)[#plugin_installation] for plugin installation in DataHub environment
Plugin Installation
DataHub's GMS Service searches for the plugins in container's local directory at location /etc/datahub/plugins/auth/. This location will be referred as plugin-base-directory hereafter.
For docker, we set docker-compose to mount ${HOME}/.datahub directory to /etc/datahub directory within the GMS containers.
Docker
Follow below steps to install plugins:
Lets consider you have created an uber jar for authorizer plugin and jar name is apache-ranger-authorizer.jar and class com.abc.RangerAuthorizer has implemented the Authorizer interface.
-
Create a plugin configuration file: Create a
config.ymlfile at${HOME}/.datahub/plugins/auth/. For more detail on configuration refer Config Detail section -
Create a plugin directory: Create plugin directory as
apache-ranger-authorizer, this directory will be referred asplugin-homehereaftermkdir -p ${HOME}/.datahub/plugins/auth/apache-ranger-authorizer -
Copy plugin jar to
plugin-home: Copyapache-ranger-authorizer.jartoplugin-homecopy apache-ranger-authorizer.jar ${HOME}/.datahub/plugins/auth/apache-ranger-authorizer -
Update plugin configuration file: Add below entry in
config.ymlfile, the plugin can take any arbitrary configuration under the "configs" block. in our example, there is username and passwordplugins: - name: "apache-ranger-authorizer" type: "authorizer" enabled: "true" params: className: "com.abc.RangerAuthorizer" configs: username: "foo" password: "fake" -
Restart datahub-gms container:
On startup DataHub GMS service performs below steps
- Load
config.yml - Prepare list of plugin where
enabledis set totrue - Look for directory equivalent to plugin
nameinplugin-base-directory. In this case it is/etc/datahub/plugins/auth/apache-ranger-authorizer/, this directory will becomeplugin-home - Look for
params.jarFileNameattribute otherwise look for jar having name as <plugin-name>.jar. In this case it is/etc/datahub/plugins/auth/apache-ranger-authorizer/apache-ranger-authorizer.jar - Load class given in plugin
params.classNameattribute from the jar, here load classcom.abc.RangerAuthorizerfromapache-ranger-authorizer.jar - Call
initmethod of plugin
On method call ofgetResourceAsStreamDataHub GMS service looks for the resource in below order.- Look for the requested resource in plugin-jar file. if found then return the resource as InputStream.
- Look for the requested resource in
plugin-homedirectory. if found then return the resource as InputStream. - Look for the requested resource in application class-loader. if found then return the resource as InputStream.
- Return
nullas requested resource is not found.
- Load
By default, authentication is disabled in DataHub GMS, Please follow section Enable GMS Authentication to enable authentication.
Kubernetes
Helm support is coming soon.
Config Detail
A sample config.yml can be found at config.yml.
config.yml structure:
| Field | Required | Type | Default | Description |
|---|---|---|---|---|
| plugins[].name | ✅ | string | name of the plugin | |
| plugins[].type | ✅ | enum[authenticator, authorizer] | type of plugin, possible values are authenticator or authorizer | |
| plugins[].enabled | ✅ | boolean | whether this plugin is enabled or disabled. DataHub GMS wouldn't process disabled plugin | |
| plugins[].params.className | ✅ | string | Authenticator or Authorizer implementation class' fully qualified class name | |
| plugins[].params.jarFileName | string | default to plugins[].name.jar |
jar file name in plugin-home |
|
| plugins[].params.configs | map<string,object> | default to empty map | Runtime configuration required for plugin |
plugins[] is an array of plugin, where you can define multiple authenticator and authorizer plugins. plugin name should be unique in plugins array.
Plugin Permissions
Adhere to below plugin access control to keep your plugin forward compatible.
- Plugin should read/write file to and from
plugin-homedirectory only. Refer Plugin Installation step2 forplugin-homedefinition - Plugin should access port 80 or 443 or port higher than 1024
All other access are forbidden for the plugin.
Disclaimer: In BETA version your plugin can access any port and can read/write to any location on file system, however you should implement the plugin as per above access permission to keep your plugin compatible with upcoming release of DataHub.
Migration Of Plugins From application.yaml
If you have any custom Authentication or Authorization plugin define in authorization or authentication section of application.yaml then migrate them as per below steps.
-
Implement Plugin: For Authentication Plugin follow steps of Implementing an Authentication Plugin and for Authorization Plugin follow steps of Implementing an Authorization Plugin
-
Install Plugin: Install the plugins as per steps mentioned in Plugin Installation. Here you need to map the configuration from application.yaml to configuration in
config.yml. This mapping fromapplication.yamltoconfig.ymlis described belowMapping for Authenticators
a. In
config.ymlsetplugins[].typetoauthenticatorb.
authentication.authenticators[].typeis mapped toplugins[].params.classNamec.
authentication.authenticators[].configsis mapped toplugins[].params.configsExample Authenticator Plugin configuration in
config.ymlplugins: - name: "apache-ranger-authenticator" type: "authenticator" enabled: "true" params: className: "com.abc.RangerAuthenticator" configs: username: "foo" password: "fake"Mapping for Authorizer
a. In
config.ymlsetplugins[].typetoauthorizerb.
authorization.authorizers[].typeis mapped toplugins[].params.classNamec.
authorization.authorizers[].configsis mapped toplugins[].params.configsExample Authorizer Plugin configuration in
config.ymlplugins: - name: "apache-ranger-authorizer" type: "authorizer" enabled: "true" params: className: "com.abc.RangerAuthorizer" configs: username: "foo" password: "fake" -
Move any other configurations files of your plugin to
plugin_homedirectory. The detail aboutplugin_homeis mentioned in Plugin Installation section.