mirror of https://github.com/datahub-project/datahub.git synced 2025-07-24 01:50:06 +00:00

fix/docs(frontend): Syncs UI with internal frontend (#2009 )

* Init new and improved in repo documentation for open source and github
{COMMIT-SYNC:dca335ff397c3e9c1283d845cef56fdb42f7853b}

* [META-13251] Initial implementation for button to integrate with wintermute
{COMMIT-SYNC:491b2bf5bc91137a2919370e81936033be15b13a}

* Quickfix for UMP Datasets not showing the correct owner link
{COMMIT-SYNC:26eb64cca78f96ed383994b7e67dd4fc8a99d985}

* Updates history of datahub documentation
{COMMIT-SYNC:306b2c72f3a81e699c8b414a8aaf4dcbfdf1ed31}

Co-authored-by: Ignacio Bona <ibonapiedrabuena@linkedin.com>

2020-11-26 06:56:18 -08:00

11 KiB

Raw Blame History

A History of DataHub Frontend/UI

Where we came from, and how we got here

Through providing a historical context for DataHub UI, we hope to provide more insight into the decisions and architecture behind our UI client.

DataHub was originally an open source project focused on just search and discovery of datasets known as Wherehows. During those times, our UI application would best be described as a jQuery application written inside an Ember wrapper. Eventually, we adopted better code practices and aligned with Ember 2.x. However, there was no clear unified thought process behind how we did things. Every feature or task was done ad hoc, and for the most part the primary goal was to get things working.

Why Ember?

We chose Ember as a framework because of its support internally at LinkedIn. A lot of libraries that we could take advantage of were made specifically to be compatible with our internal wrapper around the Ember framework, known as "Pemberly."

As time went on, though, we did take note that Ember is a very opinionated framework with a steep learning curve early on. But it's for those very same reasons that the framework itself allows quick and clean iteration for code for those who are intimately familiar with it. Additionally, as members join and leave the team internally, people generally come in with some baseline knowledge already about Ember, which means faster ramp up time for any new Ember application than coming into an unfamiliar React application.

Ultimately, the decision comes down to support, and we wanted to take advantage of as much as possible internally in order to devote more time to developing new features and maintaining the existing application.

Before we could devote a lot of time to coming up with a proper frontend architecture within the Ember ecosystem in which we found ourselves, a rush order came. Companies around the world sought to find ways to meet GDPR compliance and user privacy standards, and ours was no different. Internally, people realized that our application, which had originally intended to only be a data warehouse and search tool, and its aggregation of metadata across the company was the key to creating the backbone on which compliance efforts could be built.

And so, we got to work, the backend built to take on this new compliance pipeline and the frontend to fill in features related to compliance tasks. Unfortunately, this rush also meant that our open source story became neglected, and tech debt accumulated. Our UI code's maintainability was tested to the extreme... and it failed. Quick iterations are great, but the maintainability cost in our case was too high.

But We Got TypeScript

One long term benefit that arose from this period, though, was that seeing the need to increase the maintainability of what was becoming a larger scale application internally led us to adopt TypeScript. TypeScript provides a great avenue for maintainability in our opinion. It's not just the compilation time safety of values and variables. There was the added benefit of editor functions like accurate autocomplete and protection against null values and undefined objects/properties. Plus, we were able to start using classes much earlier than the point where they became widely used.

After GDPR, our Metadata team at LinkedIn found ourselves in a new position as a team that had the most horizontal view of all the data at LinkedIn, and that had the tools to potential solve many of the big data problems in our organization. With this new true north in mind, we began a project called Starfleet, which effectively was meant to be a revamp of Wherehows. Not only were we keen to solve the internal challenges, but we also wanted to revisit our open source story and bring life back into a project that had been effectively neglected during the GDPR rush.

We found ourselves asking two questions on the UI code:

How did we plan to be able to work with multiple teams and allow contributions from different sources?
How did we plan to accommodate code that should exist in our internal repository and expose only what is necessary to open source?

To solve both issues, we began looking at 2 options to do the same thing, which was to split our single Ember into multiple modules or packages known as Ember addons. A quick introduction to what addons are can be found on the Ember documentation website.

Each module (addon) would ultimately be consumed by our actual application. The role of the application at that point would be to aggregate addons and ensure that the whole of the project worked properly. Each team that wanted to collaborate with us could write an addon instead of having to learn the entire DataHub codebase, and our role on the team would be to guide them on how to integrate with our full application at the end.

The benefit of this also extends to the open source story. How we laid out the addons was simple. There would be a subset of modules designated as "open source" modules. These would be the components that make up DataHub UI in the open source. Internally, we would have additional modules for two reasons:

One, the module is an "extension" of an existing open source module that contains additional logic that would be specific only to our internal code. Basically if the open source was our parent class, then the internal would be a more specialized child class.
Two, the module is an "internal only" feature or entity that should not be exposed to open source.

With the above layout and some minor tooling, we could effectively expose only said external logic to the open source and ensure that we weren't polluting our pushes with internal business logic or features not relevant to the open source community. A diagram that outlines this is shown below:

We're a monorepo now

Ultimately, this development led to our application becoming laid out as a mono-repository, or monorepo, using yarn workspaces. For more information about the specifics of this, check out our documentation about our monorepo here.

Achieving More Stability

As the development of Starfleet kicked off and we experimented for a quarter or two, there were some very valuable lessons learned. One of the most valuable of these was that a quickly iterating team leads to a very dynamic backend. And a very dynamic backend leads to lots of things changing and breaking on the frontend.

While developing the UI for an internal entity we wanted to onboard, metrics, we found the hard way that models, apis, and contracts constantly change (who knew!). What we found ourselves doing, that seemed very unnecessary, though, was constantly making massive file changes to accommodate constantly evolving interfaces. Because we called our APIs and directly used the returned objects in our components, we made heavy assumptions on how to retrieve properties and pass them from parent to child components or into various methods that dealt with those objects. The result was that a single API change could unnecessarily snowball and also be prone to errors.

Data Models: A framework for indirection

Our solution to the above problem was to channel the API responses through a secondary object, a JavaScript class that defined a consistent interface. This class has knowledge of how the API behaves and contains a series of computations and getters that expose the relevant API information in a stable way to the UI. Thus, we divided the app into three layers: an API definition layer, a data fetching and translation layer, and a component (view) layer. If this sounds like simply the literal definition of an MVC framework, it is. If our solution also sounds like a poor person's GraphQL alternative solution, that's also pretty accurate.

However, Ember makes it very easy to forget these disciplines and mix concerns, and in the trenches it can be difficult to see the greater picture. What the data models ultimately did was bring us back on track and start to define and move toward a promised land for our UI.

For more information on our data models, read that section of our introduction here. For more information on where we are going with this and how we plan to move forward, check out our forward section of the documentation, particularly our thoughts on adopted GraphQL and a more framework agnostic approach.

Rapid Growth and Quicker Onboarding

Since the beginning of Project Starfleet our team has experienced rapid growth. Wherehows became re-branded as DataHub, and its use internally at LinkedIn skyrocketed. We began to realize that our UI code would have difficulty scaling to accommodate every team, feature, or entity ask that exists internally and in the open source community. One of the goals for the UI has become allowing a quicker onboarding process for new entities, aspects, or features.

Of course, not everything can always be simply plug and play, as much as we would love that. There are many specialized components that need to be hand made to accommodate very specific or unique use cases. However, for a large number of cases, we can often break down the asks into basic components that share similar behaviors with other features. If we can create a way to simply configure how those components appear on a page, and even include configurations to plug in the more specialized components, then we could significantly improve how new asks can be onboarded for the UI.

And so, render props came along

Render props are our in progress solution to the generalized UI problem, where we want to template-ize as much as possible and allow people with little frontend knowledge to be able to modify what is essentially a JSON object to make major modifications to the frontend UI. The idea is that, for each entity, we want to be able to hook up behavior for how certain pages are defined and laid out, and even how they behave, based on a configuration. For more information about how render props work on our application, check out the in-depth look here.

Conclusion

In this document, we've outlined but a few of the many challenges and learnings we've come across on our journey so far. While many of these specific challenges are not necessarily new to the frontend world, they are more interesting because of having to make these solutions work for a big data-centric application, which often has much more complex and specialized use cases than a typical application.

While GMA aims to solve the generalized metadata challenge, DataHub UI aims to solve the generalized data UI challenge associated with big metadata. For more information about our overall UI architecture, please continue reading the rest of the introduction, starting here. For more information about where we plan to take our learnings and go next, visit the forward section, like our thoughts on supporting React, for example.

11 KiB Raw Blame History