2020-02-11 14:14:57 -08:00
# DataHub Roadmap
2020-02-11 12:25:33 -08:00
2021-07-02 21:21:54 -07:00
Here is DataHub's roadmap for the next six months (until end of the year 2021).
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
We publish only a short six month roadmap for the future, because we are evolving very fast and want to adapt to the community's needs. We will be checking off against this roadmap as we make progress over the next few months.
2021-03-08 18:26:23 -08:00
**Caveat**: ETA-s are subject to change. Do let us know before you commit to your stakeholders about deploying these capabilities at your company.
If you would like to suggest new items or request timeline changes to the existing items, please submit your request through this [form ](https://docs.google.com/forms/d/1znDv7_CXXvUDcUsqzq92PgGqPSh_1yeYC3cl2xgizSE/ ) or submit a GitHub [feature request ](https://github.com/linkedin/datahub/issues/new?assignees=&labels=feature-request&template=--feature-request.md&title=A+short+description+of+the+feature+request ).
2021-03-31 12:29:01 -07:00
Of course, you always have access to our community through [Slack ](https://slack.datahubproject.io ) or our [town halls ](townhalls.md ) to chat with us live!
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
## Current Roadmap
### Q3 2021 [Jul - Sept 2021]
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Data Profiling and Dataset Previews
Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)
2021-10-25 19:48:09 -05:00
- [x] Support for data profiling and preview extraction through ingestion pipeline (column samples, not rows)
2021-07-02 21:21:54 -07:00
#### Data Quality
2021-10-25 19:48:09 -05:00
- [x] Support for data profiling and time-series views
2021-07-02 21:21:54 -07:00
- [ ] Support for data quality visualization
- [ ] Support for data health score based on data quality results and pipeline observability
- [ ] Integration with systems like Great Expectations, AWS deequ, dbt test etc.
#### Fine-grained Access Control for Metadata
2021-10-25 19:48:09 -05:00
- [x] Support for role-based access control to edit metadata
2021-07-02 21:21:54 -07:00
- Scope: Access control on entity-level, aspect-level and within aspects as well.
#### Column-level lineage
- [ ] Metadata Model
- [ ] SQL Parsing
#### Operational Metadata
- [ ] Partitioned Datasets
- [ ] Support for operational signals like completeness, freshness etc.
### Q4 2021 [Oct - Dec 2021]
#### Data Lake Ecosystem Integration
- [ ] Spark Delta Lake
- [ ] Apache Iceberg
- [ ] Apache Hudi
#### Metadata Trigger Framework
- [ ] Stateful sensors for Airflow
- [ ] Receive events for you to send alerts, email
- [ ] Slack integration
#### ML Ecosystem
2021-10-25 19:48:09 -05:00
- [x] Features (Feast)
- [x] Models (Sagemaker)
2021-07-02 21:21:54 -07:00
- [ ] Notebooks
#### Metrics Ecosystem
- [ ] Measures, Dimensions
- [ ] Relationships to Datasets and Dashboards
#### Data Mesh oriented features
- [ ] Data Product modeling
- [ ] Analytics to enable Data Meshification
#### Collaboration
- [ ] Conversations on the platform
- [ ] Knowledge Posts (Gdocs, Gslides, Gsheets)
## Beyond the horizon
### Let us know what you want!
- Submit requests [here ](https://docs.google.com/forms/d/1znDv7_CXXvUDcUsqzq92PgGqPSh_1yeYC3cl2xgizSE/ ) or
- Submit a GitHub [feature request ](https://github.com/linkedin/datahub/issues/new?assignees=&labels=feature-request&template=--feature-request.md&title=A+short+description+of+the+feature+request ).
## Historical Roadmap
### Q1 2021 [Jan - Mar 2021]
#### React UI
2021-03-08 18:26:23 -08:00
- [x] Build a new UI based on React
2021-03-18 19:34:59 -04:00
- [x] Deprecate open-source support for Ember UI
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Python-based Metadata Integration
2021-03-08 18:26:23 -08:00
- [x] Build a Python-based Ingestion Framework
- [x] Support common people repositories (LDAP)
- [x] Support common data repositories (Kafka, SQL databases, AWS Glue, Hive)
2021-05-05 12:02:28 -07:00
- [x] Support common transformation sources (dbt, Looker)
2021-03-18 19:34:59 -04:00
- [x] Support for push-based metadata emission from Python (e.g. Airflow DAGs)
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Dashboards and Charts
2021-03-08 18:26:23 -08:00
- [x] Support for dashboard and chart entity page
2021-03-18 19:34:59 -04:00
- [x] Support browse, search and discovery
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### SSO for Authentication
2021-03-18 19:34:59 -04:00
- [x] Support for Authentication (login) using OIDC providers (Okta, Google etc)
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Tags
2021-03-08 18:26:23 -08:00
Use-Case: Support for free-form global tags for social collaboration and aiding discovery
2021-03-18 19:34:59 -04:00
- [x] Edit / Create new tags
- [x] Attach tags to relevant constructs (e.g. datasets, dashboards, users, schema\_fields)
- [x] Search using tags (e.g. find all datasets with this tag, find all entities with this tag)
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Business Glossary
2021-06-03 11:43:26 -07:00
- [x] Support for business glossary model (definition + storage)
2021-03-08 18:26:23 -08:00
- [ ] Browse taxonomy
2021-06-03 11:43:26 -07:00
- [x] UI support for attaching business terms to entities and fields
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Jobs, Flows / Pipelines
2021-03-08 18:26:23 -08:00
Use case: Search and Discover your Pipelines (e.g. Airflow DAGs) and understand lineage with datasets
2021-03-18 19:34:59 -04:00
- [x] Support for Metadata Models + Backend Implementation
2021-05-05 12:02:28 -07:00
- [x] Metadata Integrations with systems like Airflow.
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Data Profiling and Dataset Previews
2021-03-08 18:26:23 -08:00
Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)
- [ ] Support for data profiling and preview extraction through ingestion pipeline
- Out of scope for Q1: Access control of data profiles and sample data
2021-07-02 21:21:54 -07:00
### Q2 2021 (Apr - Jun 2021)
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Cloud Deployment
2021-05-05 12:02:28 -07:00
- [X] Production-grade Helm charts for Kubernetes-based deployment
2021-06-03 11:43:26 -07:00
- [ ] How-to guides for deploying DataHub to all the major cloud providers
- [x] AWS
- [ ] Azure
2021-07-02 21:21:54 -07:00
- [x] GCP
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Data Quality
2021-05-05 12:02:28 -07:00
- [ ] Support for data quality visualization
- [ ] Support for data health score based on data quality results and pipeline observability
- [ ] Integration with systems like Great Expectations, AWS deequ etc.
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Product Analytics for DataHub
2021-06-03 11:43:26 -07:00
- [x] Helping you understand how your users are interacting with DataHub
- [x] Integration with common systems like Google Analytics etc.
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Usage-Based Insights
- [x] Display frequently used datasets, etc.
2021-05-05 12:02:28 -07:00
- [ ] Improved search relevance through usage data
2021-03-08 18:26:23 -08:00
2021-07-02 21:21:54 -07:00
#### Role-based Access Control
2021-03-08 18:26:23 -08:00
- Support for fine-grained access control for metadata operations (read, write, modify)
- Scope: Access control on entity-level, aspect-level and within aspects as well.
- This provides the foundation for Tag Governance, Dataset Preview access control etc.
2021-07-02 21:21:54 -07:00
#### No-code Metadata Model Additions
2021-03-08 18:26:23 -08:00
Use Case: Developers should be able to add new entities and aspects to the metadata model easily
2021-07-02 21:21:54 -07:00
- [x] No need to write any code (in Java or Python) to store, retrieve, search and query metadata
2021-05-05 12:02:28 -07:00
- [ ] No need to write any code (in GraphQL or UI) to visualize metadata
2021-03-08 18:26:23 -08:00