na zhang 5f6fffde57 Restli Client for populating espresso/oracle datasets and schema metadata (#349)
* add dali view owner etl

* add idpc ui

* add the internal flag to switch linkedin internal features

* add idpc ui

* add the internal flag to switch linkedin internal features

* DSS-3495, implement the UI for IDPC JIRA part

* DSS-4076, update the metric view since data model changed

* DSS-4092, add metric into search and advanced search

* update metric database table name and fix the refId and refIdType issue

* remove duplicated idpc entry and javascript log

* Add fetch_owner hive script

* support Appworx flow and job definition and execution

* implement the Appworx log parser

* bring the script finder back

* update the script finder source table name

* add the flow_path into lineage and extract the script info

* fix the appwors flow job and lineage extract issues

* bring the git location back to lineage script node

* sort the script finder lineage info by type

* bring the script info back for lineage job tab

* fix the master branch merge issue

* fix the oracle unixtime calculating issue

* shorten the flow&job extract interval time to 2 hours instead of 1 day

* shorten the appworx refresh time

* add license header; include RUNNING chains from SO_JOB_QUEUE for Appworx

* implement the list view for metrics

* Modify /dataset POST method to perform INSERT or UPDATE of the DatasetRecord

* apply the list view css change to metric

* upgrade idpc and script finder to ember 2.6.2

* metadata dashboard confidential field data collecting

* implement the confidential fields of metadata dashboard

* metadata dashboard dataset description collecting

* update the final table name

* update the final table name for other load function

* exchange the source target of cfg_object_name_map

* implement the description tab for metadata dashboard

* add the load dataset and field comments function

* implemented the bar and pie chart for description

* implement the ownership section for metadata dashboard

* fix the issue that appworx lineage job running too long

* add the table job attempt source code

* implemented the idpc compliance section

* Security Compliance Tab UI (#246)

* Add back WhereHows internal tracking (#251)

* DSS-5178 DSS-5277: Implements Compliance and Confidential Spec
Adds 'logs/' to ignored files

Updates EmberSelectorComponent to handle a list of string options or list of options with value and label, flags the currently selected option, and bubble change actions with 'selectionDidChange' action

DSS-5178: Removes previous updates to search.js: moving jQuery + DOM heavy imperative implementation to Ember component

DSS-5178: Adds templates and components DropRegion and DraggableItem

DSS-5178: Adds getSecuritySpec action and compliance types to Dataset controller, cleans up Datasets route and removes inline securitySpec fetch from route

DSS-5178: Updates templates for compliance spec

DSS-5178: Adds compliance component and updates template

Adds .DS_Store to gitignore

DSS-5277: Adds dataset-confidential component to DOM, Creates DatasetConfidential component, refactors out data handling from component

DSS-5277: Moves data fetching to Dataset Route model and set model data on controller, Adds template for confidential spec component

DSS-5178: Moves view related complianceTypes to component

DSS-5277 DSS-5178: Adds styling for tab content

* DSS-5277 DSS-5178: Adds support for modifying compliancePurgeEntities that don't currently have identifierFields persisted on the remote, PurgeableEntityFieldIdentifierType enum is sourced in client

* DSS-5178 DSS-5277: Adds dataType field to UI for schema field name search result. Refactors processSchema into parseSchema to get fields and types

* DSS-5277 Fixes bug with missing params property on controller depending on route entry point

* DSS-5543: Fixes rendering of datasets in detailview navigating from sidebar/ treeview (#259)

* DSS-5677: Changes component from block syntax to inline. Add property for creating a new PrivacyCompliancePolicy and SecuritySpecification for statasets without either

* DSS-5677: Adds ability to create a new PrivacyCompliancePolicy and SecuritySpecification from the client UI. Also fixes issue with matching fields and data type properties on schema with inconsistent shapes

* DSS-5677: Add create banner for datasets without Privacy policy or Security specification

* DSS-5677: Updates UI to more closely match spec, changes search input behaviour to filter from search

* ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage

* Update Nuage load process, fix owner subtype and source

* Add VOLDEMORT ETL job to fetch datasets from Nuage

* Add KAFKA ETL job to fetch topics from Nuage

* skip KAFKA topics starting with 'test' when fetching from Nuage

* Merges front-end changes from master -> DSS-5178 DSS-5577 DSS-5677 DSS-5277 DSS-5677

* DSS-5784: Fixes issue with AdvancedSearch and ScriptFinder URL queries being RFC-3986 incompliant

* ScriptFinder Controller add URL decoding for Json fields (#290)

* DSS-5888 Adds configuration support for Piwik environment tracking. Setting the 'tracking.piwik.siteid' to a value will get rendered in the template and consumed by the tracking initializer

* DSS-5888 DSS-5875 Adds tracking for users. Adds client side tracking for keyword and init for Piwik script module

* Fixes mismatch with compliance api property name: privacyCompliancePolicy != privacyCompliance

* DSS-5888 Fixes tracking userId for noscript tag

* DSS-5865 Removes spinner on metadata/dashboard/idpc-compliance fail

* DSS-6177 Removed unused links in Metric Detail page

* Update Appworx Execution and Lineage jobs (#321)

* DSS-6197: Adds default value for classification property on security specification if not defined

* DSS-6198: Fixes issue with nested fields not getting rendered in the schema for compliance and confidential tabs

* DSS-6018 Adds ui feature to track feedback on user search results relevance using a up/down voting mechanism

* Make unit tests buildable again for backend and web (#325)

* Make unit tests buildable again for backend and web.

* Add back fest dependency so the tests can stay more of less the same as before.

* Generate code coverage reports (#334)

* Add playCoverage task to run code coverage using JaCoco for backend and web.

* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.

* Add data platform filter for dashboard APIs (#322)

* Add data platform filter for dashboard APIs

* Add exception handling for Espresso and Kafka ETL job

* restli client to populate espresso and oracle metadata
2017-07-10 09:54:08 -07:00
2016-06-22 21:21:19 -07:00
2015-11-19 14:39:21 -08:00
2016-10-19 17:42:28 -07:00
2016-10-19 17:42:28 -07:00
2015-11-19 14:39:21 -08:00
2015-12-11 11:02:29 -08:00
2017-06-09 21:55:01 -07:00

WhereHows Build Status Gitter PRs Welcome

WhereHows is a data discovery and lineage tool built at LinkedIn. It integrates with all the major data processing systems and collects both catalog and operational metadata from them.

Within the central metadata repository, WhereHows curates, associates, and surfaces the metadata information through two interfaces:

  • a web application that enables data & linage discovery, and community collaboration
  • an API endpoint that empowers automation of data processes/applications

WhereHows serves as the single platform that:

  • links data objects with people and processes
  • enables crowdsourcing for data knowledge
  • provides data governance and provenance based on ownership and lineage

Documentation

The detailed information can be found in the Wiki

Examples in VM

There is a pre-built vmware image (about 11GB) to quickly demonstrate the functionality of WhereHows. Check out the VM Guide

Getting Started

New to Wherehows? Check out the Getting Started Guide

Preparation

First, please get Play Framework (Activator) in place.

# Download Activator
wget https://downloads.typesafe.com/typesafe-activator/1.3.11/typesafe-activator-1.3.11-minimal.zip

# Unzip, Remove zipped folder, move play folder to $HOME
unzip -q typesafe-activator-1.3.11-minimal.zip && rm typesafe-activator-1.3.11-minimal.zip && mv activator-1.3.11-minimal $HOME/

# Add ACTIVATOR_HOME, GRADLE_HOME. Update Path to include new gradle, alias to counteract issues
echo 'export ACTIVATOR_HOME="$HOME/activator-1.3.11-minimal"' >> ~/.bashrc
source ~/.bashrc

You need to increase the SBT build tool max heap size for building web module

echo 'export SBT_OPTS="-Xms1G -Xmx1G -Xss2M"' >> ~/.bashrc
source ~/.bashrc

Second, please setup the metadata repository in MySQL.

CREATE DATABASE wherehows
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

CREATE USER 'wherehows';
SET PASSWORD FOR 'wherehows' = PASSWORD('wherehows');
GRANT ALL ON wherehows.* TO 'wherehows'

Execute the DDL files to create the required repository tables in wherehows database.

Build

  1. Get the source code: git clone https://github.com/linkedin/WhereHows.git
  2. Put a few 3rd-party jar files to metadata-etl/extralibs directory. Some of these jar files may not be available in Maven Central or Artifactory. See the download instrucitons for more detail. cd WhereHows/metadata-etl/extralibs
  3. Go back to the WhereHows root directory and build all the modules: ./gradlew build
  4. Go back to the WhereHows root directory and start the metadata ETL and API service: cd backend-service ; $ACTIVATOR_HOME/bin/activator run
  5. Go back to the WhereHows root directory and start the web front-end: cd web ; $ACTIVATOR_HOME/bin/activator run Then WhereHows UI is available at http://localhost:9000 by default. For example, $ACTIVATOR_HOME/bin/activator run -Dhttp.port=19001 will use port 19001 to serve UI.

Contribute

Want to contribute? Check out the Contributors Guide

Community

Want help? Check out the Gitter chat room and Google Groups

Description
The Metadata Platform for your Data and AI Stack
Readme Apache-2.0 1.3 GiB
Languages
Java 40.9%
Python 30.9%
TypeScript 26.5%
JavaScript 1%
Shell 0.2%
Other 0.2%