datahub/docs/roadmap.md

172 lines
6.7 KiB
Markdown
Raw Permalink Normal View History

2020-02-11 14:14:57 -08:00
# DataHub Roadmap
2020-02-11 12:25:33 -08:00
## [The DataHub Roadmap has a new home!](https://feature-requests.datahubproject.io/roadmap)
Please refer to the [new DataHub Roadmap](https://feature-requests.datahubproject.io/roadmap) for the most up-to-date details of what we are working on!
_If you have suggestions about what we should consider in future cycles, feel free to submit a [feature request](https://feature-requests.datahubproject.io/) and/or upvote existing feature requests so we can get a sense of level of importance!_
## Historical Roadmap
2021-07-02 21:21:54 -07:00
_This following represents the progress made on historical roadmap items as of January 2022. For incomplete roadmap items, we have created Feature Requests to gauge current community interest & impact to be considered in future cycles. If you see something that is still of high-interest to you, please up-vote via the Feature Request portal link and subscribe to the post for updates as we progress through the work in future cycles._
2021-07-02 21:21:54 -07:00
### Q4 2021 [Oct - Dec 2021]
#### Data Lake Ecosystem Integration
- [ ] Spark Delta Lake - [View in Feature Reqeust Portal](https://feature-requests.datahubproject.io/b/feedback/p/spark-delta-lake)
- [ ] Apache Iceberg - [Included in Q1 2022 Roadmap - Community-Driven Metadata Ingestion Sources](https://feature-requests.datahubproject.io/roadmap/540)
- [ ] Apache Hudi - [View in Feature Request Portal](https://feature-requests.datahubproject.io/b/feedback/p/apachi-hudi-ingestion-support)
2021-07-02 21:21:54 -07:00
#### Metadata Trigger Framework
[View in Feature Request Portal](https://feature-requests.datahubproject.io/b/User-Experience/p/ability-to-subscribe-to-an-entity-to-receive-notifications-when-something-changes)
2021-07-02 21:21:54 -07:00
- [ ] Stateful sensors for Airflow
- [ ] Receive events for you to send alerts, email
- [ ] Slack integration
#### ML Ecosystem
- [x] Features (Feast)
- [x] Models (Sagemaker)
- [ ] Notebooks - View in Feature Request Portal](https://feature-requests.datahubproject.io/admin/p/jupyter-integration)
2021-07-02 21:21:54 -07:00
#### Metrics Ecosystem
[View in Feature Request Portal](https://feature-requests.datahubproject.io/b/User-Experience/p/ability-to-define-metrics-and-attach-them-to-entities)
2021-07-02 21:21:54 -07:00
- [ ] Measures, Dimensions
- [ ] Relationships to Datasets and Dashboards
#### Data Mesh oriented features
2021-07-02 21:21:54 -07:00
- [ ] Data Product modeling
- [ ] Analytics to enable Data Meshification
#### Collaboration
[View in Feature Reqeust Portal](https://feature-requests.datahubproject.io/b/User-Experience/p/collaboration-within-datahub-ui)
2021-07-02 21:21:54 -07:00
- [ ] Conversations on the platform
- [ ] Knowledge Posts (Gdocs, Gslides, Gsheets)
### Q3 2021 [Jul - Sept 2021]
2021-07-02 21:21:54 -07:00
#### Data Profiling and Dataset Previews
Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)
- [x] Support for data profiling and preview extraction through ingestion pipeline (column samples, not rows)
2021-07-02 21:21:54 -07:00
#### Data Quality
Included in Q1 2022 Roadmap - [Display Data Quality Checks in the UI](https://feature-requests.datahubproject.io/roadmap/544)
- [x] Support for data profiling and time-series views
- [ ] Support for data quality visualization
- [ ] Support for data health score based on data quality results and pipeline observability
- [ ] Integration with systems like Great Expectations, AWS deequ, dbt test etc.
2021-07-02 21:21:54 -07:00
#### Fine-grained Access Control for Metadata
- [x] Support for role-based access control to edit metadata
- Scope: Access control on entity-level, aspect-level and within aspects as well.
#### Column-level lineage
Included in Q1 2022 Roadmap - [Column Level Lineage](https://feature-requests.datahubproject.io/roadmap/541)
- [ ] Metadata Model
- [ ] SQL Parsing
#### Operational Metadata
- [ ] Partitioned Datasets - - [View in Feature Request Portal](https://feature-requests.datahubproject.io/b/User-Experience/p/advanced-dataset-schema-properties-partition-support)
- [x] Support for operational signals like completeness, freshness etc.
### Q2 2021 (Apr - Jun 2021)
#### Cloud Deployment
- [x] Production-grade Helm charts for Kubernetes-based deployment
- [ ] How-to guides for deploying DataHub to all the major cloud providers
- [x] AWS
- [ ] Azure
- [x] GCP
#### Product Analytics for DataHub
- [x] Helping you understand how your users are interacting with DataHub
- [x] Integration with common systems like Google Analytics etc.
#### Usage-Based Insights
- [x] Display frequently used datasets, etc.
- [ ] Improved search relevance through usage data
#### Role-based Access Control
- Support for fine-grained access control for metadata operations (read, write, modify)
- Scope: Access control on entity-level, aspect-level and within aspects as well.
- This provides the foundation for Tag Governance, Dataset Preview access control etc.
#### No-code Metadata Model Additions
Use Case: Developers should be able to add new entities and aspects to the metadata model easily
- [x] No need to write any code (in Java or Python) to store, retrieve, search and query metadata
- [ ] No need to write any code (in GraphQL or UI) to visualize metadata
2021-07-02 21:21:54 -07:00
### Q1 2021 [Jan - Mar 2021]
#### React UI
- [x] Build a new UI based on React
2021-03-18 19:34:59 -04:00
- [x] Deprecate open-source support for Ember UI
2021-07-02 21:21:54 -07:00
#### Python-based Metadata Integration
- [x] Build a Python-based Ingestion Framework
- [x] Support common people repositories (LDAP)
- [x] Support common data repositories (Kafka, SQL databases, AWS Glue, Hive)
- [x] Support common transformation sources (dbt, Looker)
2021-03-18 19:34:59 -04:00
- [x] Support for push-based metadata emission from Python (e.g. Airflow DAGs)
2021-07-02 21:21:54 -07:00
#### Dashboards and Charts
- [x] Support for dashboard and chart entity page
2021-03-18 19:34:59 -04:00
- [x] Support browse, search and discovery
2021-07-02 21:21:54 -07:00
#### SSO for Authentication
2021-03-18 19:34:59 -04:00
- [x] Support for Authentication (login) using OIDC providers (Okta, Google etc)
2021-07-02 21:21:54 -07:00
#### Tags
Use-Case: Support for free-form global tags for social collaboration and aiding discovery
2021-03-18 19:34:59 -04:00
- [x] Edit / Create new tags
- [x] Attach tags to relevant constructs (e.g. datasets, dashboards, users, schema_fields)
2021-03-18 19:34:59 -04:00
- [x] Search using tags (e.g. find all datasets with this tag, find all entities with this tag)
2021-07-02 21:21:54 -07:00
#### Business Glossary
- [x] Support for business glossary model (definition + storage)
- [ ] Browse taxonomy
- [x] UI support for attaching business terms to entities and fields
2021-07-02 21:21:54 -07:00
#### Jobs, Flows / Pipelines
2024-04-30 08:12:32 +09:00
Use case: Search and Discover your Pipelines (e.g. Airflow DAGs) and understand data lineage with datasets
2021-03-18 19:34:59 -04:00
- [x] Support for Metadata Models + Backend Implementation
- [x] Metadata Integrations with systems like Airflow.
2021-07-02 21:21:54 -07:00
#### Data Profiling and Dataset Previews
Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)
- [ ] Support for data profiling and preview extraction through ingestion pipeline
- Out of scope for Q1: Access control of data profiles and sample data