datahub/docs/policies.md

111 lines
5.3 KiB
Markdown

# Policies Guide
## Introduction
DataHub provides the ability to declare fine-grained access control Policies via the UI & GraphQL API.
Access policies in DataHub define *who* can *do what* to *which resources*. A few policies in plain English include
- Dataset Owners should be allowed to edit documentation, but not Tags.
- Jenny, our Data Steward, should be allowed to edit Tags for any Dashboard, but no other metadata.
- James, a Data Analyst, should be allowed to edit the Links for a specific Data Pipeline he is a downstream consumer of.
- The Data Platform team should be allowed to manage users & groups, view platform analytics, & manage policies themselves.
In this document, we'll take a deeper look at DataHub Policies & how to use them effectively.
## What is a Policy?
There are 2 types of Policy within DataHub:
1. Platform Policies
2. Metadata Policies
We'll briefly describe each.
### Platform Policies
**Platform** policies determine who has platform-level privileges on DataHub. These privileges include
- Managing Users & Groups
- Viewing the DataHub Analytics Page
- Managing Policies themselves
Platform policies can be broken down into 2 parts:
1. **Actors**: Who the policy applies to (Users or Groups)
2. **Privileges**: Which privileges should be assigned to the Actors (e.g. "View Analytics")
Note that platform policies do not include a specific "target resource" against which the Policies apply. Instead,
they simply serve to assign specific privileges to DataHub users and groups.
### Metadata Policies
**Metadata** policies determine who can do what to which Metadata Entities. For example,
- Who can edit Dataset Documentation & Links?
- Who can add Owners to a Chart?
- Who can add Tags to a Dashboard?
and so on.
A Metadata Policy can be broken down into 3 parts:
1. **Actors**: The 'who'. Specific users, groups that the policy applies to.
2. **Privileges**: The 'what'. What actions are being permitted by a policy, e.g. "Add Tags".
3. **Resources**: The 'which'. Resources that the policy applies to, e.g. "All Datasets".
> Today, the set of privileges supported includes only *write* privileges. That is, there are no read restrictions implemented yet.
## Managing Policies
Policies can be managed under the `/policies` page, or accessed inside the Control Center, a slide-out menu
appearing on the left side of the DataHub UI. The `Policies` tab will only be visible to those users having the `MANAGE_POLICIES` privilege.
Out of the box, DataHub is deployed with a set of pre-baked Policies. The set of default policies are created at deploy
time and can be found inside the `policies.json` file within `metadata-service/war/src/main/resources/boot`. This set of policies serves the
following purposes:
1. Assigns immutable super-user privileges for the root `datahub` user account (Immutable)
2. Assigns all Platform privileges for all Users by default (Editable)
The reason for #1 is to prevent people from accidentally deleting all policies and getting locked out (`datahub` super user account can be a backup)
The reason for #2 is to permit administrators to log in via OIDC or another means outside of the `datahub` root account
when they are bootstrapping with DataHub. This way, those setting up DataHub can start managing policies without friction.
Note that these privilege *can* and likely *should* be altered inside the **Policies** page of the UI.
> Pro-Tip: To login using the `datahub` account, simply navigate to `<your-datahub-domain>/login` and enter `datahub`, `datahub`. Note that the password can be customized for your
deployment by changing the `user.props` file within the `datahub-frontend` module. Notice that JaaS authentication must be enabled.
## Configuration
By default, the Policies feature is *enabled*. This means that the deployment will support creating, editing, removing, and
most importantly enforcing fine-grained access policies.
In some cases, these capabilities are not desirable. For example, if your company's users are already used to having free reign, you
may want to keep it that way. Or perhaps it is only your Data Platform team who actively uses DataHub, in which case Policies may be overkill.
For these scenarios, we've provided a back door to disable Policies in your deployment of DataHub. This will completely hide
the policies management UI and by default will allow all actions on the platform. It will be as though
each user has *all* privileges, both of the **Platform** & **Metadata** flavor.
To disable Policies, you can simply set the `AUTH_POLICIES_ENABLED` environment variable for the `datahub-gms` service container
to `false`. For example in your `docker/datahub-gms/docker.env`, you'd place
```
AUTH_POLICIES_ENABLED=false
```
## Coming Soon
The DataHub team is hard at work trying to improve the Policies feature. We are planning on building out the following:
- Hide edit action buttons on Entity pages to reflect user privileges
Under consideration
- Ability to define Metadata Policies against multiple resources scoped to a particular "Domains"
- Ability to define Metadata Policies against multiple reosurces scoped to particular "Containers" (e.g. A "schema", "database", or "collection")
## Feedback / Questions / Concerns
We want to hear from you! For any inquiries, including Feedback, Questions, or Concerns, reach out on Slack!