mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-11-04 04:39:10 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			65 lines
		
	
	
		
			4.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			65 lines
		
	
	
		
			4.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# Overview
 | 
						|
 | 
						|
Authentication is the process of verifying the identity of a user or service. There are two
 | 
						|
places where Authentication occurs inside DataHub:
 | 
						|
 | 
						|
1. DataHub frontend service when a user attempts to log in to the DataHub application.
 | 
						|
2. DataHub backend service when making API requests to DataHub.
 | 
						|
 | 
						|
In this document, we'll tak a closer look at both.
 | 
						|
 | 
						|
### Authentication in the Frontend
 | 
						|
 | 
						|
Authentication of normal users of DataHub takes place in two phases.
 | 
						|
 | 
						|
At login time, authentication is performed by either DataHub itself (via username / password entry) or a third-party Identity Provider. Once the identity
 | 
						|
of the user has been established, and credentials validated, a persistent session token is generated for the user and stored
 | 
						|
in a browser-side session cookie.
 | 
						|
 | 
						|
DataHub provides 3 mechanisms for authentication at login time:
 | 
						|
 | 
						|
- **Native Authentication** which uses username and password combinations natively stored and managed by DataHub, with users invited via an invite link.
 | 
						|
- [Single Sign-On with OpenID Connect](guides/sso/configure-oidc-react.md) to delegate authentication responsibility to third party systems like Okta or Google/Azure Authentication. This is the recommended approach for production systems.
 | 
						|
- [JaaS Authentication](guides/jaas.md) for simple deployments where authenticated users are part of some known list or invited as a [Native DataHub User](guides/add-users.md).
 | 
						|
 | 
						|
In subsequent requests, the session token is used to represent the authenticated identity of the user, and is validated by DataHub's backend service (discussed below).
 | 
						|
Eventually, the session token is expired (24 hours by default), at which point the end user is required to log in again.
 | 
						|
 | 
						|
DataHub also supports Guest users to access the system without requiring an explicit login when enabled. The default configuration disables guest authentication.
 | 
						|
When Guest access is enabled, accessing datahub with a configurable URL path logs the user in an existing user that is designated as the guest. The privileges of the guest user
 | 
						|
are controlled by adjusting privileges of that designated guest user.
 | 
						|
 | 
						|
### Authentication in the Backend (Metadata Service)
 | 
						|
 | 
						|
When a user makes a request for Data within DataHub, the request is authenticated by DataHub's Backend (Metadata Service) via a JSON Web Token. This applies to both requests originating from the DataHub application,
 | 
						|
and programmatic calls to DataHub APIs. There are two types of tokens that are important:
 | 
						|
 | 
						|
1. **Session Tokens**: Generated for users of the DataHub web application. By default, having a duration of 24 hours.
 | 
						|
   These tokens are encoded and stored inside browser-side session cookies. The duration a session token is valid for is configurable via the `MAX_SESSION_TOKEN_AGE` environment variable
 | 
						|
   on the datahub-frontend deployment. Additionally, the `AUTH_SESSION_TTL_HOURS` configures the expiration time of the actor cookie on the user's browser which will also prompt a user login. The difference between these is that the actor cookie expiration only affects the browser session and can still be used programmatically,
 | 
						|
   but when the session expires it can no longer be used programmatically either as it is created as a JWT with an expiration claim.
 | 
						|
2. **Personal Access Tokens**: These are tokens generated via the DataHub settings panel useful for interacting
 | 
						|
   with DataHub APIs. They can be used to automate processes like enriching documentation, ownership, tags, and more on DataHub. Learn
 | 
						|
   more about Personal Access Tokens [here](personal-access-tokens.md).
 | 
						|
 | 
						|
To learn more about DataHub's backend authentication, check out [Introducing Metadata Service Authentication](introducing-metadata-service-authentication.md).
 | 
						|
 | 
						|
Credentials must be provided as Bearer Tokens inside of the **Authorization** header in any request made to DataHub's API layer.
 | 
						|
 | 
						|
```shell
 | 
						|
Authorization: Bearer <your-token>
 | 
						|
```
 | 
						|
 | 
						|
As with the frontend, the backend also can optionally enable Guest authentication. If Guest authentication is enabled, all API calls made to the backend
 | 
						|
without an Authorization header are treated as guest users and the privileges associated with the designated guest user apply to those requests.
 | 
						|
 | 
						|
Note that in DataHub local quickstarts, Authentication at the backend layer is disabled for convenience. This leaves the backend
 | 
						|
vulnerable to unauthenticated requests and should not be used in production. To enable
 | 
						|
backend (token-based) authentication, simply set the `METADATA_SERVICE_AUTH_ENABLED=true` environment variable
 | 
						|
for the datahub-gms container or pod.
 | 
						|
 | 
						|
### References
 | 
						|
 | 
						|
For a quick video on the topic of users and groups within DataHub, have a look at [DataHub Basics — Users, Groups, & Authentication 101
 | 
						|
](https://youtu.be/8Osw6p9vDYY)
 |