This document provides a technical overview of the how authentication works in DataHub's backend aimed at developers evaluating or operating DataHub.
It includes a characterization of the motivations for the feature, the key components in its design, the new capabilities it provides, & configuration instructions.
c. If cookie was invalid, redirect to either a) the DataHub login screen (for [JAAS authentication](guides/jaas.md) or b) a [configured OIDC Identity Provider](guides/sso/configure-oidc-react.md) to perform authentication.
Once authentication had succeeded at the frontend proxy layer, a stateless (token-based) session cookie (PLAY_SESSION) would be set in the users browser.
All subsequent requests, including the GraphQL requests issued by the React UI, would be authenticated using this session cookie. Once a request had made it beyond
the frontend service layer, it was assumed to have been already authenticated. Hence, there was **no native authentication inside of the Metadata Service**.
### Problems with this approach
The major challenge with this situation is that requests to the backend Metadata Service were completely unauthenticated. There were 2 options for folks who required authentication at the Metadata Service layer:
1. Set up a proxy in front of Metadata Service that performed authentication
2. [A more recent possibility] Route requests to Metadata Service through DataHub Frontend Proxy, including the PLAY_SESSION
Cookie with every request.
Neither of which are ideal. Setting up a proxy to do authentication takes time & expertise. Extracting and setting a session cookie from the browser for programmatic is
clunky & unscalable. On top of that, extending the authentication system was difficult, requiring implementing a new [Play module](https://www.playframework.com/documentation/2.8.8/api/java/play/mvc/Security.Authenticator.html) within DataHub Frontend.
## Introducing Authentication in DataHub Metadata Service
To address these problems, we introduced configurable Authentication inside the **Metadata Service** itself,
meaning that requests are no longer considered trusted until they are authenticated by the Metadata Service.
Why push Authentication down? In addition to the problems described above, we wanted to plan for a future
Metadata Service Authentication is currently **opt-in**. This means that you may continue to use DataHub without Metadata Service Authentication without interruption.
To enable Metadata Service Authentication:
- set the `METADATA_SERVICE_AUTH_ENABLED` environment variable to "true" for the `datahub-gms` AND `datahub-frontend` containers / pods.
a **Personal Access Token** (described above) from the root "datahub" user account, and using this token when configuring your [Ingestion Recipes](../../metadata-ingestion/README.md#recipes).
To configure the token for use in ingestion, simply populate the "token" configuration for the `datahub-rest` sink:
```
source:
# source configs
sink:
type: "datahub-rest"
config:
...
token: <your-personal-access-token-here!>
```
> Note that ingestion occurring via `datahub-kafka` sink will continue to be Unauthenticated *for now*. Soon, we will be introducing
> support for providing an access token in the event payload itself to authenticate ingestion requests over Kafka.
### The Role of DataHub Frontend Proxy Going Forward
With these changes, DataHub Frontend Proxy will continue to play a vital part in the complex dance of Authentication. It will serve as the place
where UI-based session authentication originates and will continue to support 3rd Party SSO configuration (OIDC)
and JAAS configuration as it does today.
The major improvement is that the Frontend Service will validate credentials provided at UI login time
and generate a DataHub **Access Token**, embedding it into traditional session cookie (which will continue to work).
In summary, DataHub Frontend Service will continue to play a vital role to Authentication. It's scope, however, will likely
remain limited to concerns specific to the React UI.
## Where to go from here
These changes represent the first milestone in Metadata Service Authentication. They will serve as a foundation upon which we can build new features, prioritized based on Community demand:
2.**Service Accounts**: Create service accounts and generate Access tokens on their behalf.
3.**Kafka Ingestion Authentication**: Authenticate ingestion requests coming from the Kafka ingestion sink inside the Metadata Service.
4.**Access Token Management**: Ability to view, manage, and revoke access tokens that have been generated. (Currently, access tokens inlcude no server side state, and thus cannot be revoked once granted)
...and more! To advocate for these features or others, reach out on [Slack](https://datahubspace.slack.com/join/shared_invite/zt-nx7i0dj7-I3IJYC551vpnvvjIaNRRGw#/shared-invite/email).
## Q&As
### What if I don't want to use Metadata Service Authentication?
That's perfectly fine, for now. Metadata Service Authentication is disabled by default, only enabled if you provide the
environment variable `METADATA_SERVICE_AUTH_ENABLED` to the `datahub-gms` container or change the `authentication.enabled` to "true"
You can configure DataHub to add your custom **Authenticator** to the **Authentication Chain** by changing the `application.yaml` configuration file for the Metadata Service:
In practice, this will require migrating Metadata [Ingestion Recipes](../../metadata-ingestion/README.md#recipes) use the `datahub-rest` sink to pointing at a slightly different
Example recipe that proxies through DataHub Frontend
```yml
source:
# source configs
sink:
type: "datahub-rest"
config:
...
token: <your-personal-access-token-here!>
```
## Feedback / Questions / Concerns
We want to hear from you! For any inquiries, including Feedback, Questions, or Concerns, reach out on [Slack](https://datahubspace.slack.com/join/shared_invite/zt-nx7i0dj7-I3IJYC551vpnvvjIaNRRGw#/shared-invite/email)!