mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-31 02:37:05 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			63 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			63 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: "Components"
 | |
| ---
 | |
| 
 | |
| # DataHub Components Overview
 | |
| 
 | |
| The DataHub platform consists of the components shown in the following diagram.
 | |
| 
 | |
| <p align="center">
 | |
|   <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-components.png"/>
 | |
| </p>
 | |
| 
 | |
| ## Metadata Store
 | |
| 
 | |
| The Metadata Store is responsible for storing the [Entities & Aspects](https://docs.datahub.com/docs/metadata-modeling/metadata-model/) comprising the Metadata Graph. This includes
 | |
| exposing an API for [ingesting metadata](https://docs.datahub.com/docs/metadata-service#ingesting-entities), [fetching Metadata by primary key](https://docs.datahub.com/docs/metadata-service#retrieving-entities), [searching entities](https://docs.datahub.com/docs/metadata-service#search-an-entity), and [fetching Relationships](https://docs.datahub.com/docs/metadata-service#get-relationships-edges) between
 | |
| entities. It consists of a Spring Java Service hosting a set of [Rest.li](https://linkedin.github.io/rest.li/) API endpoints, along with
 | |
| MySQL, Elasticsearch, & Kafka for primary storage & indexing.
 | |
| 
 | |
| Get started with the Metadata Store by following the [Quickstart Guide](https://docs.datahub.com/docs/quickstart/).
 | |
| 
 | |
| ## Metadata Models
 | |
| 
 | |
| Metadata Models are schemas defining the shape of the Entities & Aspects comprising the Metadata Graph, along with the relationships between them. They are defined
 | |
| using [PDL](https://linkedin.github.io/rest.li/pdl_schema), a modeling language quite similar in form to Protobuf while serializes to JSON. Entities represent a specific class of Metadata
 | |
| Asset such as a Dataset, a Dashboard, a Data Pipeline, and beyond. Each _instance_ of an Entity is identified by a unique identifier called an `urn`. Aspects represent related bundles of data attached
 | |
| to an instance of an Entity such as its descriptions, tags, and more. View the current set of Entities supported [here](https://docs.datahub.com/docs/metadata-modeling/metadata-model#exploring-datahubs-metadata-model).
 | |
| 
 | |
| Learn more about DataHub models Metadata [here](https://docs.datahub.com/docs/metadata-modeling/metadata-model/).
 | |
| 
 | |
| ## Ingestion Framework
 | |
| 
 | |
| The Ingestion Framework is a modular, extensible Python library for extracting Metadata from external source systems (e.g.
 | |
| Snowflake, Looker, MySQL, Kafka), transforming it into DataHub's [Metadata Model](https://docs.datahub.com/docs/metadata-modeling/metadata-model/), and writing it into DataHub via
 | |
| either Kafka or using the Metadata Store Rest APIs directly. DataHub supports an [extensive list of Source connectors](https://docs.datahub.com/docs/metadata-ingestion/#installing-plugins) to choose from, along with
 | |
| a host of capabilities including schema extraction, table & column profiling, usage information extraction, and more.
 | |
| 
 | |
| Getting started with the Ingestion Framework is as simple: just define a YAML file and execute the `datahub ingest` command.
 | |
| Learn more by heading over the [Metadata Ingestion](https://docs.datahub.com/docs/metadata-ingestion/) guide.
 | |
| 
 | |
| ## GraphQL API
 | |
| 
 | |
| The [GraphQL](https://graphql.org/) API provides a strongly-typed, entity-oriented API that makes interacting with the Entities comprising the Metadata
 | |
| Graph simple, including APIs for adding and removing tags, owners, links & more to Metadata Entities! Most notably, this API is consumed by the User Interface (discussed below) for enabling Search & Discovery, Governance, Observability
 | |
| and more.
 | |
| 
 | |
| To get started using the GraphQL API, check out the [Getting Started with GraphQL](https://docs.datahub.com/docs/api/graphql/getting-started) guide.
 | |
| 
 | |
| ## User Interface
 | |
| 
 | |
| DataHub comes with a React UI including an ever-evolving set of features to make Discovering, Governing, & Debugging your Data Assets easy & delightful.
 | |
| For a full overview of the capabilities currently supported, take a look at the [Features](features.md) overview. For a look at what's coming next,
 | |
| head over to the [Roadmap](https://docs.datahub.com/docs/roadmap/).
 | |
| 
 | |
| ## Learn More
 | |
| 
 | |
| Learn more about the specifics of the [DataHub Architecture](./architecture/architecture.md) in the Architecture Overview. Learn about using & developing the components
 | |
| of the Platform by visiting the Module READMEs.
 | |
| 
 | |
| ## Feedback / Questions / Concerns
 | |
| 
 | |
| We want to hear from you! For any inquiries, including Feedback, Questions, or Concerns, reach out on [Slack](https://datahubspace.slack.com/join/shared_invite/zt-nx7i0dj7-I3IJYC551vpnvvjIaNRRGw#/shared-invite/email)!
 | 
