datahub/docs/deploy/environment-vars.md
John Joyce 205b57d561
docs(): Adding docs for approval workflows (#14259)
Co-authored-by: John Joyce <john@Mac-4333.lan>
Co-authored-by: John Joyce <john@Mac-4560.lan>
Co-authored-by: John Joyce <john@Mac-4605.lan>
2025-07-30 18:39:27 -07:00

97 KiB

title
Deployment Environment Variables

Environment Variables

The following is a summary of a few important environment variables which expose various levers which control how DataHub works.


DataHub Java Components

This includes GMS, System Update, MAE/MCE Consumers.

Authentication & Authorization

Reference Links:

Authentication Configuration

Environment Variable Default Description Components
METADATA_SERVICE_AUTH_ENABLED true Enable if you want all requests to the Metadata Service to be authenticated GMS, MAE Consumer, MCE Consumer, PE Consumer, Frontend
DATAHUB_SYSTEM_CLIENT_SECRET System client secret used by AuthServiceController GMS, MAE Consumer, MCE Consumer, PE Consumer, Actions, Frontend
METADATA_SERVICE_AUTHENTICATOR_EXCEPTIONS_ENABLED false Normally failures are only warnings, enable this to throw them GMS
DATAHUB_TOKEN_SERVICE_SIGNING_KEY Key used to validate incoming tokens and sign new tokens GMS
DATAHUB_TOKEN_SERVICE_SALT Salt used for token validation and signing GMS
DATAHUB_TOKEN_SERVICE_SIGNING_ALGORITHM HS256 Signing algorithm for DataHub tokens GMS
SESSION_TOKEN_DURATION_MS 86400000 The max duration of a UI session in milliseconds (defaults to 1 day) GMS
GUEST_AUTHENTICATION_USER guest Guest user for unauthenticated access GMS
GUEST_AUTHENTICATION_ENABLED false Enable guest authentication GMS

Authorization Configuration

Environment Variable Default Description Components
AUTH_POLICIES_ENABLED true Enable the default DataHub policies-based authorizer GMS
POLICY_CACHE_REFRESH_INTERVAL_SECONDS 120 Cache refresh interval for policies in seconds GMS
POLICY_CACHE_FETCH_SIZE 1000 Cache policy fetch size GMS
REST_API_AUTHORIZATION_ENABLED true Enable authorization of reads, writes, and deletes on REST APIs GMS
VIEW_AUTHORIZATION_ENABLED false Controls whether entity pages can limit access based on policies GMS
VIEW_AUTHORIZATION_RECOMMENDATIONS_PEER_GROUP_ENABLED true Enable peer group recommendations for view authorization GMS

Ingestion Configuration

Reference Links:

Environment Variable Default Description Components
UI_INGESTION_ENABLED true Enable UI-based ingestion GMS, MAE Consumer
INGESTION_BATCH_REFRESH_COUNT 100 Number of entities to refresh in a single batch when refreshing entities after ingestion GMS
INGESTION_SOURCE_REFRESH_INTERVAL_SECONDS 43200 Interval at which the ingestion source scheduler will check for new or updated ingestion sources GMS

Telemetry & Analytics

Environment Variable Default Description Components
INGESTION_REPORTING_ENABLED false Enable ingestion reporting GMS
ENABLE_THIRD_PARTY_LOGGING false Whether mixpanel tracking is enabled GMS

DataHub Core Configuration

Environment Variable Default Description Components
DATAHUB_SERVER_TYPE prod DataHub server type GMS
DATAHUB_GMS_ASYNC_REQUEST_TIMEOUT_MS 55000 Async request timeout for GMS GMS
DATAHUB_GMS_HOST localhost GMS host Frontend
DATAHUB_GMS_PORT 8080 GMS port Frontend
DATAHUB_GMS_USE_SSL false Use SSL for GMS connections Frontend
DATAHUB_GMS_URI null URI instead of separate host/port/ssl parameters (takes priority) Frontend
DATAHUB_GMS_SSL_PROTOCOL null SSL protocol for GMS Frontend

Plugin Configuration

Environment Variable Default Description Components
PLUGIN_SECURITY_MODE RESTRICTED Plugin security mode (RESTRICTED or LENIENT) GMS
ENTITY_REGISTRY_PLUGIN_PATH /etc/datahub/plugins/models Path for entity registry plugins GMS
ENTITY_REGISTRY_PLUGIN_LOAD_DELAY_SECONDS 60 Rate at which plugin runnable executes GMS
RETENTION_PLUGIN_PATH /etc/datahub/plugins/retention Path for retention plugins GMS
AUTH_PLUGIN_PATH /etc/datahub/plugins/auth Path for auth plugins GMS

Metrics Configuration

Environment Variable Default Description Components
DATAHUB_METRICS_HOOK_LATENCY_PERCENTILES 0.5,0.95,0.99,0.999 Hook latency percentiles GMS, MAE Consumer
DATAHUB_METRICS_HOOK_LATENCY_SERVICE_LEVEL_OBJECTIVES 300,1800,3000,10800,21600,43200 Hook latency SLOs in seconds GMS, MAE Consumer
DATAHUB_METRICS_HOOK_LATENCY_MAX_EXPECTED_VALUE 86000 Maximum expected hook latency value in seconds GMS, MAE Consumer

Entity Service Configuration

Environment Variable Default Description Components
ENTITY_SERVICE_IMPL ebean Entity service implementation GMS, MCE Consumer
ENTITY_SERVICE_ENABLE_RETENTION true Enable entity retention GMS, MCE Consumer
ENTITY_SERVICE_APPLY_RETENTION_BOOTSTRAP false Apply retention on bootstrap GMS, MCE Consumer

Graph Service Configuration

Environment Variable Default Description Components
GRAPH_SERVICE_IMPL elasticsearch Graph service implementation GMS, MAE Consumer
GRAPH_SERVICE_LIMIT_RESULTS_MAX 10000 Maximum allowed result count for queries GMS
GRAPH_SERVICE_LIMIT_RESULTS_API_DEFAULT 5000 Default API result limit GMS
GRAPH_SERVICE_LIMIT_RESULTS_STRICT false Throw exception if strict is true, otherwise override with default and warn GMS

Search Service Configuration

Environment Variable Default Description Components
SEARCH_SERVICE_BATCH_SIZE 100 Search service batch size GMS
SEARCH_SERVICE_ENABLE_CACHE false Enable search service cache GMS
SEARCH_SERVICE_ENABLE_CACHE_EVICTION false Enable search service cache eviction GMS
SEARCH_SERVICE_CACHE_IMPLEMENTATION caffeine Search service cache implementation GMS
SEARCH_SERVICE_HAZELCAST_SERVICE_NAME hazelcast-service Hazelcast service name for search cache GMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_ENABLED true Enable container expansion in search filters GMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_PAGE_SIZE 100 Page size for container expansion GMS
SEARCH_SERVICE_FILTER_CONTAINER_EXPANSION_LIMIT 100 Limit for container expansion GMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_ENABLED true Enable domain expansion in search filters GMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_PAGE_SIZE 100 Page size for domain expansion GMS
SEARCH_SERVICE_FILTER_DOMAIN_EXPANSION_LIMIT 100 Limit for domain expansion GMS
SEARCH_SERVICE_LIMIT_RESULTS_MAX 10000 Maximum allowed result count for queries GMS
SEARCH_SERVICE_LIMIT_RESULTS_API_DEFAULT 5000 Default API result limit GMS
SEARCH_SERVICE_LIMIT_RESULTS_STRICT false Throw exception if strict is true, otherwise override with default and warn GMS

Timeseries Aspect Service

Environment Variable Default Description Components
TIMESERIES_ASPECT_SERVICE_QUERY_CONCURRENCY 10 Parallel threads for timeseries queries GMS
TIMESERIES_ASPECT_SERVICE_QUERY_QUEUE_SIZE 500 Queue size for timeseries queries GMS
TIMESERIES_ASPECT_SERVICE_QUERY_THREAD_KEEP_ALIVE 60 Thread keep alive time for timeseries queries GMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_MAX 10000 Maximum allowed result count for queries GMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_API_DEFAULT 5000 Default API result limit GMS
TIMESERIES_ASPECT_SERVICE_LIMIT_RESULTS_STRICT false Throw exception if strict is true, otherwise override with default and warn GMS

System Metadata Service

Environment Variable Default Description Components
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_MAX 10000 Maximum allowed result count for queries GMS
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_API_DEFAULT 5000 Default API result limit GMS
SYSTEM_METADATA_SERVICE_LIMIT_RESULTS_STRICT false Throw exception if strict is true, otherwise override with default and warn GMS

Platform Analytics

Environment Variable Default Description Components
DATAHUB_ANALYTICS_ENABLED true Enable platform analytics GMS, MAE Consumer, Frontend
DATAHUB_ANALYTICS_TRACING_ENABLED true Enable backend usage tracing GMS
ANALYTICS_DATAHUB_USAGE_EVENT_TYPES CreateAccessTokenEvent,CreatePolicyEvent,UpdatePolicyEvent,CreateIngestionSourceEvent,UpdateIngestionSourceEvent,RevokeAccessTokenEvent,CreateUserEvent,UpdateUserEvent,DeletePolicyEvent Comma separated list of usage event types to listen to GMS
ANALYTICS_GENERIC_ASPECT_TYPES `` Filter list for generic aspect events GMS
ANALYTICS_USER_FILTERS `` Filter out specific users' events from being published GMS

Visual Configuration

Queries Tab

Environment Variable Default Description Components
REACT_APP_QUERIES_TAB_RESULT_SIZE 5 Queries tab result size (experimental) Frontend

Theme Configuration

Environment Variable Default Description Components
REACT_APP_CUSTOM_THEME_ID `` Custom theme ID for rendering specific theme file Frontend

Assets Configuration

Environment Variable Default Description Components
REACT_APP_LOGO_URL /assets/platforms/datahublogo.png Logo URL for the application Frontend
REACT_APP_FAVICON_URL /assets/icons/favicon.ico Favicon URL for the application Frontend
REACT_APP_TITLE `` Application title Frontend

UI Configuration

Environment Variable Default Description Components
REACT_APP_HIDE_GLOSSARY false Hide glossary in the UI Frontend
REACT_APP_SHOW_FULL_TITLE_IN_LINEAGE false Show full title in lineage Frontend
DOMAIN_DEFAULT_TAB `` Default tab for domains (set to DOCUMENTATION_TAB to show documentation tab first) Frontend
APPLICATION_SHOW_SIDEBAR_SECTION_WHEN_EMPTY false Show sidebar section when empty (deprecated) Frontend
SEARCH_RESULT_NAME_HIGHLIGHT_ENABLED true Enable visual highlighting on search result names/descriptions Frontend

Storage Layer Configuration

EBean Configuration (MySQL/PostgreSQL)

Environment Variable Default Description Components
EBEAN_DATASOURCE_USERNAME datahub Database username GMS, MCE Consumer, System Update
EBEAN_DATASOURCE_PASSWORD datahub Database password GMS, MCE Consumer, System Update
EBEAN_DATASOURCE_URL jdbc:mysql://localhost:3306/datahub JDBC URL GMS, MCE Consumer, System Update
EBEAN_DATASOURCE_DRIVER com.mysql.jdbc.Driver JDBC Driver GMS, MCE Consumer, System Update
EBEAN_MIN_CONNECTIONS 2 Minimum database connections GMS, MCE Consumer, System Update
EBEAN_MAX_CONNECTIONS 50 Maximum database connections GMS, MCE Consumer, System Update
EBEAN_MAX_INACTIVE_TIME_IN_SECS 120 Maximum inactive time in seconds GMS, MCE Consumer, System Update
EBEAN_MAX_AGE_MINUTES 120 Maximum age in minutes GMS, MCE Consumer, System Update
EBEAN_LEAK_TIME_MINUTES 15 Leak time in minutes GMS, MCE Consumer, System Update
EBEAN_WAIT_TIMEOUT_MILLIS 1000 Wait timeout in milliseconds GMS, MCE Consumer, System Update
EBEAN_AUTOCREATE false Auto-create DDL GMS, MCE Consumer, System Update
EBEAN_POSTGRES_USE_AWS_IAM_AUTH false Use AWS IAM authentication for PostgreSQL GMS, MCE Consumer, System Update
EBEAN_BATCH_GET_METHOD IN Batch get method (IN or UNION) GMS, MCE Consumer, System Update

Cassandra Configuration

Environment Variable Default Description Components
CASSANDRA_DATASOURCE_USERNAME cassandra Cassandra username GMS, MCE Consumer, System Update
CASSANDRA_DATASOURCE_PASSWORD cassandra Cassandra password GMS, MCE Consumer, System Update
CASSANDRA_HOSTS cassandra Cassandra hosts GMS, MCE Consumer, System Update
CASSANDRA_PORT 9042 Cassandra port GMS, MCE Consumer, System Update
CASSANDRA_DATACENTER datacenter1 Cassandra datacenter GMS, MCE Consumer, System Update
CASSANDRA_KEYSPACE datahub Cassandra keyspace GMS, MCE Consumer, System Update
CASSANDRA_USE_SSL false Use SSL for Cassandra GMS, MCE Consumer, System Update

Elasticsearch Configuration

Environment Variable Default Description Components
ELASTICSEARCH_HOST localhost Elasticsearch host GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PORT 9200 Elasticsearch port GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_THREAD_COUNT 2 Elasticsearch thread count GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_CONNECTION_REQUEST_TIMEOUT 5000 Connection request timeout GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_USERNAME null Elasticsearch username GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PASSWORD null Elasticsearch password GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_PATH_PREFIX null Elasticsearch path prefix GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_USE_SSL false Use SSL for Elasticsearch GMS, MAE Consumer, MCE Consumer, System Update
OPENSEARCH_USE_AWS_IAM_AUTH false Use AWS IAM authentication for OpenSearch GMS, MAE Consumer, MCE Consumer, System Update
AWS_REGION null AWS region GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_IMPLEMENTATION elasticsearch Implementation (elasticsearch or opensearch) GMS, MAE Consumer, MCE Consumer, System Update
ELASTIC_ID_HASH_ALGO MD5 ID hash algorithm GMS, MAE Consumer, MCE Consumer, System Update

SSL Context Configuration

Environment Variable Default Description Components
ELASTICSEARCH_SSL_PROTOCOL null SSL protocol GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_SECURE_RANDOM_IMPL null SSL secure random implementation GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_FILE null SSL truststore file GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_TYPE null SSL truststore type GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_TRUSTSTORE_PASSWORD null SSL truststore password GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_FILE null SSL keystore file GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_TYPE null SSL keystore type GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEYSTORE_PASSWORD null SSL keystore password GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_SSL_KEY_PASSWORD null SSL key password GMS, MAE Consumer, MCE Consumer, System Update

Bulk Operations Configuration

Environment Variable Default Description Components
ES_BULK_DELETE_BATCH_SIZE 5000 Bulk delete batch size GMS, MAE Consumer
ES_BULK_DELETE_SLICES auto Bulk delete slices GMS, MAE Consumer
ES_BULK_DELETE_POLL_INTERVAL 30 Bulk delete poll interval GMS, MAE Consumer
ES_BULK_DELETE_POLL_UNIT SECONDS Bulk delete poll unit GMS, MAE Consumer
ES_BULK_DELETE_TIMEOUT 30 Bulk delete timeout GMS, MAE Consumer
ES_BULK_DELETE_TIMEOUT_UNIT MINUTES Bulk delete timeout unit GMS, MAE Consumer
ES_BULK_DELETE_NUM_RETRIES 3 Bulk delete number of retries GMS, MAE Consumer
ES_BULK_ASYNC true Enable async bulk operations GMS, MAE Consumer
ES_BULK_REQUESTS_LIMIT 1000 Bulk requests limit GMS, MAE Consumer
ES_BULK_FLUSH_PERIOD 1 Bulk flush period GMS, MAE Consumer
ES_BULK_NUM_RETRIES 3 Bulk number of retries GMS, MAE Consumer
ES_BULK_RETRY_INTERVAL 1 Bulk retry interval GMS, MAE Consumer
ES_BULK_REFRESH_POLICY NONE Bulk refresh policy GMS, MAE Consumer
ES_BULK_ENABLE_BATCH_DELETE false Enable batch delete GMS, MAE Consumer

Index Configuration

Environment Variable Default Description Components
INDEX_PREFIX `` Index prefix GMS, MAE Consumer, MCE Consumer, System Update
ELASTICSEARCH_INDEX_DOC_IDS_SCHEMA_FIELD_HASH_ID_ENABLED false Enable hash ID for schema field doc IDs GMS, MAE Consumer, MCE Consumer, System Update

Build Indices Configuration

Environment Variable Default Description Components
ELASTICSEARCH_BUILD_INDICES_ALLOW_DOC_COUNT_MISMATCH false Allow document count mismatch when clone indices is enabled System Update
ELASTICSEARCH_BUILD_INDICES_CLONE_INDICES true Clone indices System Update
ELASTICSEARCH_BUILD_INDICES_RETENTION_UNIT DAYS Retention unit for indices System Update
ELASTICSEARCH_BUILD_INDICES_RETENTION_VALUE 60 Retention value for indices System Update
ELASTICSEARCH_BUILD_INDICES_REINDEX_OPTIMIZATION_ENABLED true Enable reindex optimization System Update
ELASTICSEARCH_NUM_SHARDS_PER_INDEX 1 Number of shards per index System Update
ELASTICSEARCH_NUM_REPLICAS_PER_INDEX 1 Number of replicas per index System Update
ELASTICSEARCH_INDEX_BUILDER_NUM_RETRIES 3 Index builder number of retries System Update
ELASTICSEARCH_INDEX_BUILDER_REFRESH_INTERVAL_SECONDS 3 Index builder refresh interval System Update
SEARCH_DOCUMENT_MAX_ARRAY_LENGTH 1000 Maximum array length in search documents System Update
SEARCH_DOCUMENT_MAX_OBJECT_KEYS 1000 Maximum object keys in search documents System Update
SEARCH_DOCUMENT_MAX_VALUE_LENGTH 4096 Maximum value length in search documents System Update
ELASTICSEARCH_MAIN_TOKENIZER null Main tokenizer System Update
ELASTICSEARCH_INDEX_BUILDER_MAPPINGS_REINDEX false Enable mappings reindex System Update
ELASTICSEARCH_INDEX_BUILDER_SETTINGS_REINDEX false Enable settings reindex System Update
ELASTICSEARCH_INDEX_BUILDER_MAX_REINDEX_HOURS 0 Maximum reindex hours (0 = no timeout) System Update
ELASTICSEARCH_INDEX_BUILDER_SETTINGS_OVERRIDES null Index builder settings overrides System Update
ELASTICSEARCH_MIN_SEARCH_FILTER_LENGTH 3 Minimum search filter length System Update
ELASTICSEARCH_INDEX_BUILDER_ENTITY_SETTINGS_OVERRIDES null Entity settings overrides System Update

Search Configuration

Environment Variable Default Description Components
ELASTICSEARCH_QUERY_MAX_TERM_BUCKET_SIZE 60 Maximum term bucket size GMS
ELASTICSEARCH_QUERY_EXACT_MATCH_EXCLUSIVE false Only return exact matches when using quotes GMS
ELASTICSEARCH_QUERY_EXACT_MATCH_WITH_PREFIX true Include prefix match in exact match results GMS
ELASTICSEARCH_QUERY_EXACT_MATCH_FACTOR 16.0 Multiply by this number on true exact match GMS
ELASTICSEARCH_QUERY_EXACT_MATCH_PREFIX_FACTOR 1.1 Multiply by this number when prefix match GMS
ELASTICSEARCH_QUERY_EXACT_MATCH_CASE_FACTOR 0.0 Stacked boost multiplier when case mismatch GMS
ELASTICSEARCH_QUERY_EXACT_MATCH_ENABLE_STRUCTURED true Enable exact match on structured search GMS
ELASTICSEARCH_QUERY_TWO_GRAM_FACTOR 1.2 Boost multiplier when match on 2-gram tokens GMS
ELASTICSEARCH_QUERY_THREE_GRAM_FACTOR 1.5 Boost multiplier when match on 3-gram tokens GMS
ELASTICSEARCH_QUERY_FOUR_GRAM_FACTOR 1.8 Boost multiplier when match on 4-gram tokens GMS
ELASTICSEARCH_QUERY_PARTIAL_URN_FACTOR 0.5 Multiplier on Urn token match GMS
ELASTICSEARCH_QUERY_PARTIAL_FACTOR 0.4 Multiplier on possible non-Urn token match GMS
ELASTICSEARCH_QUERY_CUSTOM_CONFIG_ENABLED true Enable search query and ranking customization GMS
ELASTICSEARCH_QUERY_CUSTOM_CONFIG_FILE search_config.yaml Location of search customization configuration GMS
ELASTICSEARCH_QUERY_SEARCH_FIELD_CONFIG_DEFAULT legacy Default field configuration for search GMS
ELASTICSEARCH_QUERY_AUTOCOMPLETE_FIELD_CONFIG_DEFAULT legacy Default field configuration for autocomplete GMS

Graph Search Configuration

Environment Variable Default Description Components
ELASTICSEARCH_SEARCH_GRAPH_TIMEOUT_SECONDS 50 Graph DAO timeout seconds GMS
ELASTICSEARCH_SEARCH_GRAPH_BATCH_SIZE 1000 Graph DAO batch size GMS
ELASTICSEARCH_SEARCH_GRAPH_MULTI_PATH_SEARCH false Allow path retraversal for all paths GMS
ELASTICSEARCH_SEARCH_GRAPH_BOOST_VIA_NODES true Boost graph edges with via nodes GMS
ELASTICSEARCH_SEARCH_GRAPH_STATUS_ENABLED false Enable soft delete tracking of URNs on edges GMS
ELASTICSEARCH_SEARCH_GRAPH_LINEAGE_MAX_HOPS 20 Maximum hops to traverse lineage graph GMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_MAX_HOPS 1000 Maximum hops to traverse for impact analysis GMS
ELASTICSEARCH_SEARCH_GRAPH_IMPACT_MAX_THREADS 32 Maximum parallel lineage graph queries GMS
ELASTICSEARCH_SEARCH_GRAPH_QUERY_OPTIMIZATION true Reduce query nesting if possible GMS

Neo4j Configuration

Environment Variable Default Description Components
NEO4J_USERNAME neo4j Neo4j username GMS, MAE Consumer, System Update
NEO4J_PASSWORD datahub Neo4j password GMS, MAE Consumer, System Update
NEO4J_URI bolt://localhost Neo4j URI GMS, MAE Consumer, System Update
NEO4J_DATABASE graph.db Neo4j database GMS, MAE Consumer, System Update
NEO4J_MAX_CONNECTION_POOL_SIZE 100 Maximum connection pool size GMS, MAE Consumer, System Update
NEO4J_MAX_CONNECTION_ACQUISITION_TIMEOUT_IN_SECONDS 60 Maximum connection acquisition timeout GMS, MAE Consumer, System Update
NEO4j_MAX_CONNECTION_LIFETIME_IN_SECONDS 3600 Maximum connection lifetime GMS, MAE Consumer, System Update
NEO4J_MAX_TRANSACTION_RETRY_TIME_IN_SECONDS 30 Maximum transaction retry time GMS, MAE Consumer, System Update
NEO4J_CONNECTION_LIVENESS_CHECK_TIMEOUT_IN_SECONDS -1 Connection liveness check timeout GMS, MAE Consumer, System Update

Kafka Configuration

Reference Links:

Topic Configuration

Environment Variable Default Description Components
DATAHUB_USAGE_EVENT_NAME DataHubUsageEvent_v1 DataHub usage event topic name GMS, MAE Consumer, MCE Consumer, Actions, Frontend

Bootstrap Servers

Environment Variable Default Description Components
KAFKA_BOOTSTRAP_SERVER http://localhost:9092 Kafka bootstrap servers GMS, MAE Consumer, MCE Consumer, PE Consumer, Actions, Frontend

Producer Configuration

Environment Variable Default Description Components
KAFKA_PRODUCER_RETRY_COUNT 3 Producer retry count GMS, MCE Consumer, System Update
KAFKA_PRODUCER_DELIVERY_TIMEOUT 30000 Producer delivery timeout GMS, MCE Consumer, System Update
KAFKA_PRODUCER_REQUEST_TIMEOUT 3000 Producer request timeout GMS, MCE Consumer, System Update
KAFKA_PRODUCER_BACKOFF_TIMEOUT 500 Producer backoff timeout GMS, MCE Consumer, System Update
KAFKA_PRODUCER_COMPRESSION_TYPE snappy Producer compression algorithm GMS, MCE Consumer, System Update
KAFKA_PRODUCER_MAX_REQUEST_SIZE 5242880 Maximum bytes sent by producer GMS, MCE Consumer, System Update

Consumer Configuration

Environment Variable Default Description Components
KAFKA_LISTENER_CONCURRENCY 1 Number of Kafka consumer threads GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MAX_PARTITION_FETCH_BYTES 5242880 Maximum data per partition GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_STOP_ON_DESERIALIZATION_ERROR true Stop on deserialization error GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_HEALTH_CHECK_ENABLED true Enable health check for consumers GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCP_AUTO_OFFSET_RESET earliest MCP consumer auto offset reset GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCL_AUTO_OFFSET_RESET earliest MCL consumer auto offset reset GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_CONSUMER_MCL_FINE_GRAINED_LOGGING_ENABLED false Enable fine-grained logging for MCL GMS, MAE Consumer
KAFKA_CONSUMER_MCL_ASPECTS_TO_DROP `` Aspects to drop for MCL GMS, MAE Consumer
KAFKA_CONSUMER_PE_AUTO_OFFSET_RESET latest PE consumer auto offset reset GMS, PE Consumer
KAFKA_CONSUMER_PERCENTILES 0.5,0.95,0.99,0.999 Consumer percentiles GMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer
KAFKA_CONSUMER_SERVICE_LEVEL_OBJECTIVES 300,1800,3000,10800,21600,43200 Consumer SLOs in seconds GMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer
KAFKA_CONSUMER_MAX_EXPECTED_VALUE 86000 Maximum expected consumer value in seconds GMS, MAE Consumer, MCE Consumer, PE Consumer, PE Consumer

Consumer Pool Configuration

Environment Variable Default Description Components
KAFKA_CONSUMER_POOL_INITIAL_SIZE 1 Consumer pool initial size GMS
KAFKA_CONSUMER_POOL_MAX_SIZE 5 Consumer pool maximum size GMS

Schema Registry Configuration

Environment Variable Default Description Components
SCHEMA_REGISTRY_TYPE KAFKA Schema registry type (INTERNAL, KAFKA, or AWS_GLUE) GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_SCHEMAREGISTRY_URL http://localhost:8081 Schema registry URL GMS, MAE Consumer, MCE Consumer, PE Consumer
SCHEMA_REGISTRY_URL http://localhost:8081 Schema registry URL (Actions) Actions
AWS_GLUE_SCHEMA_REGISTRY_REGION us-east-1 AWS Glue schema registry region GMS, MAE Consumer, MCE Consumer, PE Consumer
AWS_GLUE_SCHEMA_REGISTRY_NAME null AWS Glue schema registry name GMS, MAE Consumer, MCE Consumer, PE Consumer
KAFKA_PROPERTIES_SECURITY_PROTOCOL PLAINTEXT Kafka security protocol GMS, MAE Consumer, MCE Consumer, PE Consumer, Actions

Spring Configuration

Kafka Security

Environment Variable Default Description Components
spring.kafka.security.protocol PLAINTEXT Kafka security protocol GMS, MAE Consumer, MCE Consumer, PE Consumer

Management & Monitoring

JMX Configuration

Environment Variable Default Description Components
spring.jmx.enabled true Enable JMX GMS, MAE Consumer, MCE Consumer, PE Consumer

Endpoints Configuration

Environment Variable Default Description Components
management.endpoints.web.exposure.include prometheus,info,healthcheck,metrics Exposed web endpoints GMS
management.endpoints.jmx.enabled true Enable JMX endpoints GMS

Metrics Configuration

Environment Variable Default Description Components
management.metrics.cache.enabled false Enable cache metrics GMS, MAE Consumer, MCE Consumer, PE Consumer
management.metrics.export.jmx.enabled true Enable JMX metrics export GMS, MAE Consumer, MCE Consumer, PE Consumer
management.metrics.export.prometheus.enabled true Enable Prometheus metrics export GMS, MAE Consumer, MCE Consumer, PE Consumer

Server Configuration

Environment Variable Default Description Components
server.server-header false Server header GMS

Feature Flags

Reference Links:

Environment Variable Default Description Components
SHOW_SIMPLIFIED_HOMEPAGE_BY_DEFAULT false Show simplified homepage with just datasets, charts and dashboards GMS
LINEAGE_SEARCH_CACHE_ENABLED true Enable in-memory cache for searchAcrossLineage query GMS
GRAPH_SERVICE_DIFF_MODE_ENABLED true Enable diff mode for graph writes GMS
POINT_IN_TIME_CREATION_ENABLED false Enable creation of point in time snapshots for scroll API GMS
ALWAYS_EMIT_CHANGE_LOG false Always emit MCL even when no changes detected GMS
SEARCH_SERVICE_DIFF_MODE_ENABLED true Enable diff mode for search document writes GMS
READ_ONLY_MODE_ENABLED false Enable read only mode for instance GMS
SHOW_ACCESS_MANAGEMENT false Show AccessManagement tab in UI GMS
SHOW_SEARCH_FILTERS_V2 true Show search filters V2 experience GMS
SHOW_BROWSE_V2 true Show browse v2 sidebar experience GMS
PLATFORM_BROWSE_V2 true Enable platform browse experience GMS
LINEAGE_GRAPH_V2 true Enable new lineage visualization GMS
PRE_PROCESS_HOOKS_UI_ENABLED true Circumvent Kafka for UI changes GMS
PRE_PROCESS_HOOKS_UI_ENABLED false Reprocess UI sourced events asynchronously GMS
SHOW_ACRYL_INFO false Show CTAs around moving to DataHub Cloud GMS
ER_MODEL_RELATIONSHIP_FEATURE_ENABLED false Enable Join Tables Feature GMS
NESTED_DOMAINS_ENABLED true Enable nested Domains feature GMS
SCHEMA_FIELD_ENTITY_FETCH_ENABLED true Enable fetching schema field entities GMS
BUSINESS_ATTRIBUTE_ENTITY_ENABLED false Enable business attribute entity GMS
DATA_CONTRACTS_ENABLED true Enable Data Contracts feature GMS
ALTERNATE_MCP_VALIDATION false Enable alternate MCP validation flow GMS
THEME_V2_ENABLED true Allow theme v2 to be turned on GMS
THEME_V2_DEFAULT true Set default theme for users GMS
THEME_V2_TOGGLEABLE true Allow theme v2 to be toggled (Acryl only) GMS
SCHEMA_FIELD_CLL_ENABLED false Enable schema field-level lineage links GMS
SCHEMA_FIELD_LINEAGE_IGNORE_STATUS true Ignore schema field status in lineage GMS
SHOW_SEPARATE_SIBLINGS false Separate siblings with no combined view GMS
EDITABLE_DATASET_NAME_ENABLED false Enable editing dataset name in UI GMS
SHOW_MANAGE_STRUCTURED_PROPERTIES true Show manage structured properties button GMS
HIDE_DBT_SOURCE_IN_LINEAGE false Hide dbt sources in lineage GMS
SHOW_NAV_BAR_REDESIGN true Show newly designed nav bar GMS
SHOW_AUTO_COMPLETE_RESULTS true Show auto complete results in search bar GMS
ENTITY_VERSIONING_ENABLED false Enable entity versioning APIs GMS
SHOW_HAS_SIBLINGS_FILTER false Show "has siblings" filter in search GMS
SHOW_SEARCH_BAR_AUTOCOMPLETE_REDESIGN false Show redesigned search bar autocomplete GMS
SHOW_MANAGE_TAGS true Allow users to manage tags in UI GMS
SHOW_INTRODUCE_PAGE true Show introduce page in V2 UI GMS
SHOW_INGESTION_PAGE_REDESIGN false Show re-designed Ingestion page GMS
SHOW_LINEAGE_EXPAND_MORE true Show expand more button in lineage graph GMS
SHOW_HOME_PAGE_REDESIGN false Show re-designed home page GMS
LINEAGE_GRAPH_V3 false Enable redesign of lineage v2 graph GMS
SHOW_PRODUCT_UPDATES true Show in-product update popover GMS
LOGICAL_MODELS_ENABLED false Enable logical models feature GMS
SHOW_HOMEPAGE_USER_ROLE false Display homepage user role underneath name GMS
VIEWS_ENABLED true Enable views feature GMS

System Updates

Reference Links:

Bootstrap Configuration

Environment Variable Default Description Components
BOOTSTRAP_POLICIES_FILE classpath:boot/policies.json Bootstrap policies file GMS
BOOTSTRAP_SERVLETS_WAITTIMEOUT 60 Total waiting time for servlets to initialize GMS

System Update Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_INITIAL_BACK_OFF_MILLIS 5000 Initial back off for system updates System Update
BOOTSTRAP_SYSTEM_UPDATE_MAX_BACK_OFFS 50 Maximum back offs for system updates System Update
BOOTSTRAP_SYSTEM_UPDATE_BACK_OFF_FACTOR 2 Multiplicative factor for back off System Update
BOOTSTRAP_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATE true Wait for system update to complete System Update
SYSTEM_UPDATE_BOOTSTRAP_MCP_CONFIG bootstrap_mcps.yaml Bootstrap MCP configuration System Update

Data Job Node CLL Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_ENABLED false Enable data job node CLL System Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_BATCH_SIZE 1000 Data job node CLL batch size System Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_DELAY_MS 30000 Data job node CLL delay in milliseconds System Update
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_LIMIT 0 Data job node CLL limit System Update

Domain Description Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_ENABLED true Enable domain description updates System Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_BATCH_SIZE 1000 Domain description batch size System Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_DELAY_MS 30000 Domain description delay in milliseconds System Update
BOOTSTRAP_SYSTEM_UPDATE_DOMAIN_DESCRIPTION_CLL_LIMIT 0 Domain description CLL limit System Update

Dashboard Info Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_ENABLED true Enable dashboard info updates System Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_BATCH_SIZE 1000 Dashboard info batch size System Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_DELAY_MS 30000 Dashboard info delay in milliseconds System Update
BOOTSTRAP_SYSTEM_UPDATE_DASHBOARD_INFO_CLL_LIMIT 0 Dashboard info CLL limit System Update

Browse Paths V2 Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_BROWSE_PATHS_V2_ENABLED true Enable browse paths V2 updates System Update
BOOTSTRAP_SYSTEM_UPDATE_BROWSE_PATHS_V2_BATCH_SIZE 5000 Browse paths V2 batch size System Update
REPROCESS_DEFAULT_BROWSE_PATHS_V2 false Reprocess default browse paths V2 System Update

Ingestion Indices Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_ENABLED true Enable ingestion indices updates System Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_BATCH_SIZE 5000 Ingestion indices batch size System Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_DELAY_MS 1000 Ingestion indices delay in milliseconds System Update
BOOTSTRAP_SYSTEM_UPDATE_INGESTION_INDICES_CLL_LIMIT 0 Ingestion indices CLL limit System Update

Policy Fields Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_ENABLED true Enable policy fields updates System Update
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_BATCH_SIZE 5000 Policy fields batch size System Update
REPROCESS_DEFAULT_POLICY_FIELDS false Reprocess default policy fields System Update

Ownership Types Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_ENABLED true Enable ownership types updates System Update
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_BATCH_SIZE 1000 Ownership types batch size System Update
BOOTSTRAP_SYSTEM_UPDATE_OWNERSHIP_TYPES_REPROCESS false Reprocess ownership types System Update

Schema Fields Configuration

Environment Variable Default Description Components
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_ENABLED false Enable schema fields from schema metadata System Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_BATCH_SIZE 500 Schema fields from schema metadata batch size System Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_DELAY_MS 1000 Schema fields from schema metadata delay System Update
SYSTEM_UPDATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA_LIMIT 0 Schema fields from schema metadata limit System Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_ENABLED false Enable schema fields doc IDs System Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_BATCH_SIZE 500 Schema fields doc IDs batch size System Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_DELAY_MS 5000 Schema fields doc IDs delay System Update
SYSTEM_UPDATE_SCHEMA_FIELDS_DOC_IDS_LIMIT 0 Schema fields doc IDs limit System Update

Process Instance Configuration

Environment Variable Default Description Components
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_ENABLED true Enable process instance has run events System Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_BATCH_SIZE 100 Process instance has run events batch size System Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_DELAY_MS 1000 Process instance has run events delay System Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_TOTAL_DAYS 90 Process instance has run events total days System Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_WINDOW_DAYS 1 Process instance has run events window days System Update
SYSTEM_UPDATE_PROCESS_INSTANCE_HAS_RUN_EVENTS_REPROCESS false Reprocess process instance has run events System Update

Edge Status Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_ENABLED false Enable edge status updates System Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_BATCH_SIZE 1000 Edge status batch size System Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_DELAY_MS 5000 Edge status delay in milliseconds System Update
BOOTSTRAP_SYSTEM_UPDATE_EDGE_STATUS_LIMIT 0 Edge status limit System Update

Property Definitions Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_ENABLED true Enable property definitions updates System Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_BATCH_SIZE 500 Property definitions batch size System Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_DELAY_MS 1000 Property definitions delay in milliseconds System Update
BOOTSTRAP_SYSTEM_UPDATE_PROPERTY_DEFINITIONS_CLL_LIMIT 0 Property definitions CLL limit System Update

Remove Query Edges Configuration

Environment Variable Default Description Components
BOOTSTRAP_SYSTEM_UPDATE_REMOVE_QUERY_EDGES_ENABLED true Enable remove query edges System Update
BOOTSTRAP_SYSTEM_UPDATE_REMOVE_QUERY_EDGES_RETRIES 20 Remove query edges retries System Update

Additional Environment Variables

The following environment variables are used in the codebase but may not be explicitly defined in the application.yaml file:

Ingestion and Processing

Environment Variable Default Description Components
ASYNC_INGEST_DEFAULT false Asynchronously process ingestProposals by writing to Kafka GMS
STRICT_URN_VALIDATION_ENABLED false Enable stricter URN validation logic GMS
DATAHUB_DATASET_URN_TO_LOWER null Convert dataset URN names to lowercase GMS
BUSINESS_ATTRIBUTE_ENTITY_ENABLED false Enable business attribute entity feature GMS

REST and Servlet Configuration

Environment Variable Default Description Components
RESTLI_SERVLET_THREADS null Number of threads for REST servlet GMS, MCE Consumer
RESTLI_TIMEOUT_SECONDS 60 REST timeout in seconds GMS, MCE Consumer

System and Version Information

Environment Variable Default Description Components
DATAHUB_GMS_PROTOCOL http GMS protocol (http/https) GMS

Upgrade and Migration

Environment Variable Default Description Components
SKIP_REINDEX_EDGE_STATUS false Skip reindexing edge status System Update
SKIP_REINDEX_DATA_JOB_INPUT_OUTPUT false Skip reindexing data job input/output System Update
SKIP_GENERATE_SCHEMA_FIELDS_FROM_SCHEMA_METADATA false Skip generating schema fields from schema metadata System Update
SKIP_MIGRATE_SCHEMA_FIELDS_DOC_ID false Skip migrating schema fields doc IDs System Update
BACKFILL_BROWSE_PATHS_V2 false Enable backfilling browse paths V2 System Update
READER_POOL_SIZE null Reader pool size for restore operations System Update
WRITER_POOL_SIZE null Writer pool size for restore operations System Update

OpenTelemetry Configuration

Environment Variable Default Description Components
OTEL_METRICS_EXPORTER none OpenTelemetry metrics exporter GMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_TRACES_EXPORTER none OpenTelemetry traces exporter GMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_LOGS_EXPORTER none OpenTelemetry logs exporter GMS, MAE Consumer, MCE Consumer, PE Consumer
OTEL_PROPAGATORS null OpenTelemetry propagators GMS, MAE Consumer, MCE Consumer, PE Consumer

Secret Service Configuration

Environment Variable Default Description Components
SECRET_SERVICE_ENCRYPTION_KEY ENCRYPTION_KEY Secret service encryption key GMS
SECRET_SERVICE_V1_ALGORITHM_ENABLED true Enable v1 algorithm for secret service GMS

Health Check Configuration

Environment Variable Default Description Components
HEALTH_CHECK_CACHE_DURATION_SECONDS 5 Health check cache duration GMS

Metadata Tests Configuration

Environment Variable Default Description Components
METADATA_TESTS_ENABLED false Enable metadata tests GMS

Hooks Configuration

Environment Variable Default Description Components
ENABLE_SIBLING_HOOK true Enable automatic sibling associations GMS, MAE Consumer
SIBLINGS_HOOK_CONSUMER_GROUP_SUFFIX `` Siblings hook consumer group suffix GMS, MAE Consumer
ENABLE_UPDATE_INDICES_HOOK true Enable update indices hook GMS, MAE Consumer
UPDATE_INDICES_CONSUMER_GROUP_SUFFIX `` Update indices consumer group suffix GMS, MAE Consumer
ENABLE_INGESTION_SCHEDULER_HOOK true Enable ingestion scheduling GMS, MAE Consumer
INGESTION_SCHEDULER_HOOK_CONSUMER_GROUP_SUFFIX `` Ingestion scheduler hook consumer group suffix GMS, MAE Consumer
ENABLE_INCIDENTS_HOOK true Enable incidents hook GMS, MAE Consumer
MAX_INCIDENT_HISTORY 100 Maximum incident history GMS, MAE Consumer
INCIDENTS_HOOK_CONSUMER_GROUP_SUFFIX `` Incidents hook consumer group suffix GMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_HOOK true Enable structured properties mappings GMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_WRITE true Enable writing structured property values GMS, MAE Consumer
ENABLE_STRUCTURED_PROPERTIES_SYSTEM_UPDATE false Enable structured property mappings in system update GMS, MAE Consumer
ENABLE_ENTITY_CHANGE_EVENTS_HOOK true Enable entity change events hook GMS, MAE Consumer
ECE_CONSUMER_GROUP_SUFFIX `` Entity change events consumer group suffix GMS, MAE Consumer
ECE_ENTITY_EXCLUSIONS schemaField Entities to exclude from ECE hook GMS, MAE Consumer
FORMS_HOOK_ENABLED true Enable forms hook GMS, MAE Consumer
FORMS_HOOK_CONSUMER_GROUP_SUFFIX `` Forms hook consumer group suffix GMS, MAE Consumer

Search and API Configuration

Environment Variable Default Description Components
SEARCH_BAR_API_VARIANT AUTOCOMPLETE_FOR_MULTIPLE Search bar API variant Frontend
FIRST_IN_PERSONAL_SIDEBAR YOUR_ASSETS First item in personal sidebar Frontend

Client Configuration

Environment Variable Default Description Components
ENTITY_CLIENT_RETRY_INTERVAL 2 Entity client retry interval GMS
ENTITY_CLIENT_NUM_RETRIES 3 Entity client number of retries GMS
ENTITY_CLIENT_JAVA_GET_BATCH_SIZE 375 Entity client Java get batch size GMS
ENTITY_CLIENT_JAVA_INGEST_BATCH_SIZE 375 Entity client Java ingest batch size GMS
ENTITY_CLIENT_RESTLI_GET_BATCH_SIZE 100 Entity client RESTli get batch size GMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_CONCURRENCY 2 Entity client RESTli get batch concurrency GMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_QUEUE_SIZE 500 Entity client RESTli get batch queue size GMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_GET_BATCH_THREAD_KEEP_ALIVE 60 Entity client RESTli get batch thread keep alive GMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_SIZE 50 Entity client RESTli ingest batch size GMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_CONCURRENCY 2 Entity client RESTli ingest batch concurrency GMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_QUEUE_SIZE 500 Entity client RESTli ingest batch queue size GMS, MAE Consumer, PE Consumer
ENTITY_CLIENT_RESTLI_INGEST_BATCH_THREAD_KEEP_ALIVE 60 Entity client RESTli ingest batch thread keep alive GMS, MAE Consumer, PE Consumer
USAGE_CLIENT_RETRY_INTERVAL 2 Usage client retry interval GMS, MAE Consumer, PE Consumer
USAGE_CLIENT_NUM_RETRIES 0 Usage client number of retries GMS, MAE Consumer, PE Consumer
USAGE_CLIENT_TIMEOUT_MS 3000 Usage client timeout in milliseconds GMS, MAE Consumer, PE Consumer

Cache Configuration

Environment Variable Default Description Components
CACHE_TTL_SECONDS 600 Default cache time to live GMS
CACHE_MAX_SIZE 10000 Maximum number of items to cache GMS
CACHE_ENTITY_COUNTS_TTL_SECONDS 600 Homepage entity count time to live GMS
CACHE_SEARCH_LINEAGE_TTL_SECONDS 86400 Search lineage cache time to live GMS
CACHE_SEARCH_LINEAGE_LIGHTNING_THRESHOLD 300 Lineage graphs exceeding this limit will use local cache GMS
CACHE_CLIENT_USAGE_CLIENT_ENABLED true Enable usage client cache GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_STATS_ENABLED true Enable usage client cache stats GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_STATS_INTERVAL_SECONDS 120 Usage client cache stats interval GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_TTL_SECONDS 86400 Usage client cache TTL GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_USAGE_CLIENT_MAX_BYTES 52428800 Usage client cache max bytes (50MB) GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_ENABLED true Enable entity client cache GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_STATS_ENABLED true Enable entity client cache stats GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_STATS_INTERVAL_SECONDS 120 Entity client cache stats interval GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_TTL_SECONDS 0 Entity client cache TTL (0 = no cache) GMS, MAE Consumer, PE Consumer
CACHE_CLIENT_ENTITY_CLIENT_MAX_BYTES 104857600 Entity client cache max bytes (100MB) GMS, MAE Consumer, PE Consumer

GraphQL Configuration

Environment Variable Default Description Components
GRAPHQL_CONCURRENCY_SEPARATE_THREAD_POOL false Enable separate thread pool for GraphQL GMS
GRAPHQL_CONCURRENCY_STACK_SIZE 256000 GraphQL thread pool stack size GMS
GRAPHQL_CONCURRENCY_CORE_POOL_SIZE -1 GraphQL core pool size (default 5 * cores) GMS
GRAPHQL_CONCURRENCY_MAX_POOL_SIZE -1 GraphQL max pool size (default 100 * cores) GMS
GRAPHQL_CONCURRENCY_KEEP_ALIVE 60 GraphQL thread keep alive time GMS
GRAPHQL_QUERY_COMPLEXITY_LIMIT 2000 GraphQL query complexity limit GMS
GRAPHQL_QUERY_DEPTH_LIMIT 50 GraphQL query depth limit GMS
GRAPHQL_QUERY_INTROSPECTION_ENABLED true Enable GraphQL introspection GMS
GRAPHQL_METRICS_ENABLED true Enable GraphQL metrics collection GMS
GRAPHQL_PERCENTILES 0.5,0.75,0.95,0.98,0.99,0.999 GraphQL percentiles GMS
GRAPHQL_METRICS_FIELD_LEVEL_ENABLED false Enable field-level GraphQL metrics GMS
GRAPHQL_METRICS_FIELD_LEVEL_OPERATIONS getSearchResultsForMultiple,searchAcrossLineageStructure GraphQL field-level operations GMS
GRAPHQL_METRICS_FIELD_LEVEL_PATH_ENABLED false Include field path in GraphQL metrics GMS
GRAPHQL_METRICS_FIELD_LEVEL_PATHS `` GraphQL field-level paths GMS
GRAPHQL_METRICS_TRIVIAL_DATA_FETCHERS_ENABLED false Include trivial data fetchers in GraphQL metrics GMS

Chrome Extension Configuration

Environment Variable Default Description Components
CHROME_EXTENSION_ENABLED true Enable Chrome extension Frontend
CHROME_EXTENSION_LINEAGE_ENABLED true Enable Chrome extension lineage Frontend

Business Attribute Configuration

Environment Variable Default Description Components
BUSINESS_ATTRIBUTE_RELATED_ENTITIES_COUNT 20000 Business attribute related entities count GMS
BUSINESS_ATTRIBUTE_RELATED_ENTITIES_BATCH_SIZE 1000 Business attribute related entities batch size GMS
BUSINESS_ATTRIBUTE_PROPAGATION_CONCURRENCY_THREAD_COUNT -1 Business attribute propagation thread count (default 2 * cores) GMS
BUSINESS_ATTRIBUTE_PROPAGATION_CONCURRENCY_KEEP_ALIVE 60 Business attribute propagation keep alive time GMS

Metadata Change Proposal Configuration

Environment Variable Default Description Components
MCP_CONSUMER_BATCH_ENABLED false Enable MCP consumer batch processing GMS, MCE Consumer
MCP_CONSUMER_BATCH_SIZE 15744000 MCP consumer batch size GMS, MCE Consumer
MCP_VALIDATION_IGNORE_UNKNOWN true Ignore unknown fields in MCP validation GMS, MCE Consumer
MCP_VALIDATION_PRIVILEGE_CONSTRAINTS true Enable privilege constraints in MCP validation GMS, MCE Consumer
MCP_VALIDATION_EXTENSIONS_ENABLED false Enable extensions in MCP validation GMS, MCE Consumer
MCP_SIDE_EFFECTS_SCHEMA_FIELD_ENABLED false Enable schema field side effects GMS, MCE Consumer
MCP_SIDE_EFFECTS_DATA_PRODUCT_UNSET_ENABLED true Enable data product unset side effects GMS, MCE Consumer
MCP_THROTTLE_UPDATE_INTERVAL_MS 60000 MCP throttle update interval GMS, MCE Consumer
MCP_MCE_CONSUMER_THROTTLE_ENABLED false Enable MCE consumer throttling GMS, MCE Consumer
MCP_API_REQUESTS_THROTTLE_ENABLED false Enable API requests throttling GMS, MCE Consumer
MCP_VERSIONED_THROTTLE_ENABLED false Enable versioned MCL topic throttling GMS, MCE Consumer
MCP_VERSIONED_THRESHOLD 4000 Versioned throttle threshold GMS, MCE Consumer
MCP_VERSIONED_MAX_ATTEMPTS 1000 Versioned max attempts GMS, MCE Consumer
MCP_VERSIONED_INITIAL_INTERVAL_MS 100 Versioned initial interval GMS, MCE Consumer
MCP_VERSIONED_MULTIPLIER 10 Versioned multiplier GMS, MCE Consumer
MCP_VERSIONED_MAX_INTERVAL_MS 30000 Versioned max interval GMS, MCE Consumer
MCP_TIMESERIES_THROTTLE_ENABLED false Enable timeseries MCL topic throttling GMS, MCE Consumer
MCP_TIMESERIES_THRESHOLD 4000 Timeseries throttle threshold GMS, MCE Consumer
MCP_TIMESERIES_MAX_ATTEMPTS 1000 Timeseries max attempts GMS, MCE Consumer
MCP_TIMESERIES_INITIAL_INTERVAL_MS 100 Timeseries initial interval GMS, MCE Consumer
MCP_TIMESERIES_MULTIPLIER 10 Timeseries multiplier GMS, MCE Consumer
MCP_TIMESERIES_MAX_INTERVAL_MS 30000 Timeseries max interval GMS, MCE Consumer

Events API Configuration

Environment Variable Default Description Components
EVENTS_API_ENABLED true Enable events API GMS

Iceberg Catalog Configuration

Environment Variable Default Description Components
ENABLE_PUBLIC_READ false Enable public read for Iceberg catalog GMS
PUBLICLY_READABLE_TAG PUBLICLY_READABLE Publicly readable tag for Iceberg catalog GMS

Component Configuration

Variable Default Description Components
MCP_CONSUMER_ENABLED true When running in standalone mode, disabled on GMS and enable on separate MCE Consumer. GMS, MCE Consumer
MCL_CONSUMER_ENABLED true When running in standalone mode, disabled on GMS and enable on separate MAE Consumer. GMS, MAE Consumer
PE_CONSUMER_ENABLED true When running in standalone mode, disabled on GMS and enable on separate MAE Consumer. GMS, PE Consumer

DataHub Frontend

Play Framework Configuration

Secret Key Configuration

Environment Variable Default Description Components
DATAHUB_SECRET null Secret key used to secure cryptographic functions Frontend

HTTP Parser Configuration

Environment Variable Default Description Components
DATAHUB_PLAY_MEM_BUFFER_SIZE 10MB Maximum memory buffer size for HTTP parser Frontend

Server Configuration

Environment Variable Default Description Components
DATAHUB_AKKA_MAX_HEADER_COUNT 64 Maximum number of headers allowed Frontend
DATAHUB_AKKA_MAX_HEADER_VALUE_LENGTH 32k Maximum header value length Frontend

Session Configuration

Environment Variable Default Description Components
AUTH_COOKIE_SAME_SITE LAX SameSite attribute for authentication cookies Frontend
AUTH_COOKIE_SECURE false Whether authentication cookies should be secure Frontend

Authentication Configuration

OIDC Configuration

Reference Links:

Required OIDC Configuration

Environment Variable Default Description Components
AUTH_OIDC_ENABLED false Enable OIDC authentication Frontend
AUTH_OIDC_CLIENT_ID null Unique client ID issued by the identity provider Frontend
AUTH_OIDC_CLIENT_SECRET null Unique client secret issued by the identity provider Frontend
AUTH_OIDC_DISCOVERY_URI null The IdP OIDC discovery URL Frontend
AUTH_OIDC_BASE_URL null The base URL associated with your DataHub deployment Frontend

Optional OIDC Configuration

Environment Variable Default Description Components
AUTH_OIDC_USER_NAME_CLAIM preferred_username The attribute/claim used to derive the DataHub username Frontend
AUTH_OIDC_USER_NAME_CLAIM_REGEX (.*) The regex used to parse the DataHub username from the user name claim Frontend
AUTH_OIDC_SCOPE oidc email profile String representing the requested scope from the IdP Frontend
AUTH_OIDC_CLIENT_AUTHENTICATION_METHOD client_secret_basic Authentication method to pass credentials to token endpoint Frontend
AUTH_OIDC_JIT_PROVISIONING_ENABLED true Whether DataHub users should be provisioned on login if they don't exist Frontend
AUTH_OIDC_PRE_PROVISIONING_REQUIRED false Whether the user should already exist in DataHub on login Frontend
AUTH_OIDC_EXTRACT_GROUPS_ENABLED true Whether groups should be extracted from a claim in the OIDC profile Frontend
AUTH_OIDC_GROUPS_CLAIM groups The OIDC claim to extract groups information from Frontend
AUTH_OIDC_RESPONSE_TYPE null OIDC response type Frontend
AUTH_OIDC_RESPONSE_MODE null OIDC response mode Frontend
AUTH_OIDC_USE_NONCE null Whether to use nonce in OIDC flow Frontend
AUTH_OIDC_CUSTOM_PARAM_RESOURCE null Custom resource parameter for OIDC Frontend
AUTH_OIDC_READ_TIMEOUT null OIDC read timeout Frontend
AUTH_OIDC_CONNECT_TIMEOUT null OIDC connect timeout Frontend
AUTH_OIDC_EXTRACT_JWT_ACCESS_TOKEN_CLAIMS false Whether to extract claims from JWT access token Frontend
AUTH_OIDC_PREFERRED_JWS_ALGORITHM null Which JWS algorithm to use Frontend
AUTH_OIDC_ACR_VALUES null OIDC ACR values Frontend
AUTH_OIDC_GRANT_TYPE null OIDC grant type Frontend

Authentication Methods Configuration

Environment Variable Default Description Components
AUTH_JAAS_ENABLED true Enable JAAS authentication Frontend
AUTH_NATIVE_ENABLED true Enable native authentication Frontend
GUEST_AUTHENTICATION_ENABLED false Enable guest authentication Frontend
GUEST_AUTHENTICATION_USER guest The name of the guest user ID Frontend
GUEST_AUTHENTICATION_PATH null The path to bypass login page and get logged in as guest Frontend
ENFORCE_VALID_EMAIL true Enforce the usage of a valid email for user sign up Frontend

Authentication Logging

Environment Variable Default Description Components
AUTH_VERBOSE_LOGGING false Enable verbose authentication logging Frontend

Session Configuration

Environment Variable Default Description Components
AUTH_SESSION_TTL_HOURS 24 Login session expiration time in hours Frontend
MAX_SESSION_TOKEN_AGE 24h Maximum age of session token Frontend

Metadata Service Configuration

Connection Configuration

Environment Variable Default Description Components
DATAHUB_GMS_HOST localhost Metadata service host Frontend
DATAHUB_GMS_PORT 8080 Metadata service port Frontend
DATAHUB_GMS_USE_SSL false Whether to use SSL for metadata service connection Frontend

Authentication Configuration

Environment Variable Default Description Components
METADATA_SERVICE_AUTH_ENABLED false Enable metadata service authentication Frontend
DATAHUB_SYSTEM_CLIENT_SECRET JohnSnowKnowsNothing System client secret for metadata service Frontend

Entity Client Configuration

Environment Variable Default Description Components
ENTITY_CLIENT_RETRY_INTERVAL 2 Entity client retry interval Frontend
ENTITY_CLIENT_NUM_RETRIES 3 Entity client number of retries Frontend
ENTITY_CLIENT_RESTLI_GET_BATCH_SIZE 50 Entity client RESTli get batch size Frontend
ENTITY_CLIENT_RESTLI_GET_BATCH_CONCURRENCY 2 Entity client RESTli get batch concurrency Frontend

Notes

  • Environment variables follow the pattern of converting YAML property paths to uppercase with underscores
  • Default values are shown in the table above
  • For Kafka configuration, refer to the official Spring Kafka documentation for additional properties
  • Feature flags control experimental or optional functionality
  • System update configurations control various background maintenance tasks
  • Cache configurations help optimize performance for different use cases
  • GraphQL configurations control query complexity and performance monitoring
  • OpenTelemetry variables control observability and tracing behavior
  • Play Framework properties are converted to environment variables by:
    • Converting dots (.) to underscores (_)
    • Converting to uppercase