81 Commits

Author SHA1 Message Date
Mars Lan
b4fec37f61 Fix Kerberos authentication so that HIVE_DATASET_METADATA_ETL jobs can be run from non-grid cluster. (#482) 2017-07-10 13:42:56 -07:00
Mars Lan
cf4e157813 Read master key from environmental variable instead of from local fil… (#417)
* Read master key from environmental variable instead of from local file. This would allow us to pass it in via cfg2 ultimiately.

* Move the env var name to Constant.java
2017-07-10 09:55:16 -07:00
na zhang
5f6fffde57 Restli Client for populating espresso/oracle datasets and schema metadata (#349)
* add dali view owner etl

* add idpc ui

* add the internal flag to switch linkedin internal features

* add idpc ui

* add the internal flag to switch linkedin internal features

* DSS-3495, implement the UI for IDPC JIRA part

* DSS-4076, update the metric view since data model changed

* DSS-4092, add metric into search and advanced search

* update metric database table name and fix the refId and refIdType issue

* remove duplicated idpc entry and javascript log

* Add fetch_owner hive script

* support Appworx flow and job definition and execution

* implement the Appworx log parser

* bring the script finder back

* update the script finder source table name

* add the flow_path into lineage and extract the script info

* fix the appwors flow job and lineage extract issues

* bring the git location back to lineage script node

* sort the script finder lineage info by type

* bring the script info back for lineage job tab

* fix the master branch merge issue

* fix the oracle unixtime calculating issue

* shorten the flow&job extract interval time to 2 hours instead of 1 day

* shorten the appworx refresh time

* add license header; include RUNNING chains from SO_JOB_QUEUE for Appworx

* implement the list view for metrics

* Modify /dataset POST method to perform INSERT or UPDATE of the DatasetRecord

* apply the list view css change to metric

* upgrade idpc and script finder to ember 2.6.2

* metadata dashboard confidential field data collecting

* implement the confidential fields of metadata dashboard

* metadata dashboard dataset description collecting

* update the final table name

* update the final table name for other load function

* exchange the source target of cfg_object_name_map

* implement the description tab for metadata dashboard

* add the load dataset and field comments function

* implemented the bar and pie chart for description

* implement the ownership section for metadata dashboard

* fix the issue that appworx lineage job running too long

* add the table job attempt source code

* implemented the idpc compliance section

* Security Compliance Tab UI (#246)

* Add back WhereHows internal tracking (#251)

* DSS-5178 DSS-5277: Implements Compliance and Confidential Spec
Adds 'logs/' to ignored files

Updates EmberSelectorComponent to handle a list of string options or list of options with value and label, flags the currently selected option, and bubble change actions with 'selectionDidChange' action

DSS-5178: Removes previous updates to search.js: moving jQuery + DOM heavy imperative implementation to Ember component

DSS-5178: Adds templates and components DropRegion and DraggableItem

DSS-5178: Adds getSecuritySpec action and compliance types to Dataset controller, cleans up Datasets route and removes inline securitySpec fetch from route

DSS-5178: Updates templates for compliance spec

DSS-5178: Adds compliance component and updates template

Adds .DS_Store to gitignore

DSS-5277: Adds dataset-confidential component to DOM, Creates DatasetConfidential component, refactors out data handling from component

DSS-5277: Moves data fetching to Dataset Route model and set model data on controller, Adds template for confidential spec component

DSS-5178: Moves view related complianceTypes to component

DSS-5277 DSS-5178: Adds styling for tab content

* DSS-5277 DSS-5178: Adds support for modifying compliancePurgeEntities that don't currently have identifierFields persisted on the remote, PurgeableEntityFieldIdentifierType enum is sourced in client

* DSS-5178 DSS-5277: Adds dataType field to UI for schema field name search result. Refactors processSchema into parseSchema to get fields and types

* DSS-5277 Fixes bug with missing params property on controller depending on route entry point

* DSS-5543: Fixes rendering of datasets in detailview navigating from sidebar/ treeview (#259)

* DSS-5677: Changes component from block syntax to inline. Add property for creating a new PrivacyCompliancePolicy and SecuritySpecification for statasets without either

* DSS-5677: Adds ability to create a new PrivacyCompliancePolicy and SecuritySpecification from the client UI. Also fixes issue with matching fields and data type properties on schema with inconsistent shapes

* DSS-5677: Add create banner for datasets without Privacy policy or Security specification

* DSS-5677: Updates UI to more closely match spec, changes search input behaviour to filter from search

* ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage

* Update Nuage load process, fix owner subtype and source

* Add VOLDEMORT ETL job to fetch datasets from Nuage

* Add KAFKA ETL job to fetch topics from Nuage

* skip KAFKA topics starting with 'test' when fetching from Nuage

* Merges front-end changes from master -> DSS-5178 DSS-5577 DSS-5677 DSS-5277 DSS-5677

* DSS-5784: Fixes issue with AdvancedSearch and ScriptFinder URL queries being RFC-3986 incompliant

* ScriptFinder Controller add URL decoding for Json fields (#290)

* DSS-5888 Adds configuration support for Piwik environment tracking. Setting the 'tracking.piwik.siteid' to a value will get rendered in the template and consumed by the tracking initializer

* DSS-5888 DSS-5875 Adds tracking for users. Adds client side tracking for keyword and init for Piwik script module

* Fixes mismatch with compliance api property name: privacyCompliancePolicy != privacyCompliance

* DSS-5888 Fixes tracking userId for noscript tag

* DSS-5865 Removes spinner on metadata/dashboard/idpc-compliance fail

* DSS-6177 Removed unused links in Metric Detail page

* Update Appworx Execution and Lineage jobs (#321)

* DSS-6197: Adds default value for classification property on security specification if not defined

* DSS-6198: Fixes issue with nested fields not getting rendered in the schema for compliance and confidential tabs

* DSS-6018 Adds ui feature to track feedback on user search results relevance using a up/down voting mechanism

* Make unit tests buildable again for backend and web (#325)

* Make unit tests buildable again for backend and web.

* Add back fest dependency so the tests can stay more of less the same as before.

* Generate code coverage reports (#334)

* Add playCoverage task to run code coverage using JaCoco for backend and web.

* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.

* Add data platform filter for dashboard APIs (#322)

* Add data platform filter for dashboard APIs

* Add exception handling for Espresso and Kafka ETL job

* restli client to populate espresso and oracle metadata
2017-07-10 09:54:08 -07:00
Mars Lan
e36a40cd65 Generate code coverage reports (#334)
* Add playCoverage task to run code coverage using JaCoco for backend and web.

* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.
2017-07-10 09:53:28 -07:00
Yi Wang
7d6bb9fac9 Add KAFKA ETL job to fetch topics from Nuage 2017-07-10 09:53:27 -07:00
Yi Wang
a9335bc49a Add VOLDEMORT ETL job to fetch datasets from Nuage 2017-07-10 09:53:27 -07:00
Yi Wang
cf49ae375c ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage 2017-07-10 09:53:27 -07:00
jbai
e8b21a17df implement the Appworx log parser 2017-07-10 09:53:24 -07:00
jbai
eb93d67b64 support Appworx flow and job definition and execution 2017-07-10 09:53:24 -07:00
Zhen Chen
b36f774feb add dali view owner etl 2017-07-10 09:53:23 -07:00
Yi (Alan) Wang
488929ad93 Modify Confidential info schema to add identifierField and logicalType (#385) 2017-03-30 21:29:20 -07:00
Yi Wang
adba532474 Modify compliance purge entity record to support logical type and is_subject 2017-03-23 22:08:54 -07:00
Yi Wang
6c57e30240 Fix bugs found by AppCheck in issue #328 2017-02-24 11:08:18 -08:00
camelliazhang
724f754f03 clean and refactor elastic serach ETL job (#300) 2016-12-14 21:22:30 -08:00
Eric Sun
a3504fa57f Fix jsonpath after upgrading com.jayway.jsonpath to 2.2 (#299)
* use schema_url_helper to fetch avro schema from hdfs or http location

* trim space

* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper

* fix jsonpath for job history log parser; do not throw exception if kerberos config files are missing for job history http connection

* avoid null return value for sepCommaString(); fix a typo
2016-12-13 21:14:52 -08:00
Yi (Alan) Wang
e07306b51e Update MetadataChangeEvent, separate privacy compliance from security (#275) 2016-11-11 17:25:41 -08:00
Yi Wang
b4f5e438e2 Add JobExecutionLineageEvent and kafka processor 2016-11-08 19:11:37 -08:00
Eric Sun
7b36d09b58 Add get_schema_literal_from_url() to fetch schema literal based on schema url (#268)
* use schema_url_helper to fetch avro schema from hdfs or http location

* trim space

* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
2016-11-07 08:14:45 -08:00
Yi Wang
664e4072bb Upgrade to play 2.4.8 2016-10-19 17:42:28 -07:00
Na Zhang
043dc25e89 Get owners for espresso and oracle, and fix a bug for teradata 2016-10-19 11:13:32 -07:00
Yi Wang
5049c847fa Update Kafka consumer actors to reduce memory usage 2016-10-10 14:49:14 -07:00
Yi Wang
c9f4f18d9c Update Azkaban_Execution job to fetch cronExpression in flow scheduling 2016-10-06 13:43:10 -07:00
Yi (Alan) Wang
c9dfb637af Update MetadataChangeEvent APIs according to schema change (#243)
* Update MetadataChangeEvent APIs according to schema change

* Update MultiproductLoad to reflect new Owner types

* Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:33:45 -07:00
Yi Wang
0356497124 Add comments for Owner_type precedence (priority) and compliance 2016-10-06 13:24:29 -07:00
Yi Wang
b74d58a33f Update MetadataChangeEvent APIs according to schema change 2016-10-03 10:56:23 -07:00
jbai
a11e4908dc tracking the GobblinTrackingEvent_autit to get owner information 2016-09-29 15:01:32 -07:00
Yi Wang
ac34eb683f Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery 2016-09-26 15:06:33 -07:00
Yi (Alan) Wang
753de7de7c Merge pull request #233 from alyiwang/master
Update backend APIs to cast SQL results back to Java record then to Json
2016-09-21 08:59:04 -07:00
Eric Sun
89ff794ddf Add api to get dependents of a dataset (#232)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule

* continue process if hive sql parsor encounters error

* reformat etl job log message

* add API to find dataset dependents, such as which hive tables are based on an hdfs path
2016-09-21 08:55:44 -07:00
Yi Wang
be65efb0cc Update backend APIs to cast SQL results back to Java record then serialize to Json reply 2016-09-20 18:56:49 -07:00
Yi Wang(Data Infrastructure)
1171e00097 Add REST proxy for Security API from backend to web 2016-09-19 18:14:10 -07:00
Yi Wang
ee01d7c6c7 rename DatasetPropertiesRecord to DatasetInventoryPropertiesRecord 2016-09-15 11:46:26 -07:00
Yi Wang
b136fc6c37 Add MetadataInventoryEvent processor and API 2016-09-15 09:22:42 -07:00
Eric Sun
86bf71499f Reformat the ETL job info message in log. (#222)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule

* continue process if hive sql parsor encounters error

* reformat etl job log message
2016-09-13 14:01:14 -07:00
Yi Wang
5515cbdde9 Add MatadataChangeEvent processor to call seperate APIs 2016-09-06 16:41:50 -07:00
Yi Wang
81f891bfab Map scm repo owner to dataset owner table 2016-08-30 15:35:28 -07:00
Yi Wang
e2b42d2ccb Update DatasetOwnerRecord to be compatible with linkedin branch 2016-08-25 09:12:31 -07:00
Yi Wang
7cbda15b5a Add confidential and recursive column to dict_dataset_field 2016-08-23 15:50:30 -07:00
Yi Wang
d46a9d8b8e merge API tables to existing dataset owner and schema field table 2016-08-22 17:06:20 -07:00
Yi Wang
46871face6 Add metadataChangeEvent APIs to backend-service 2016-08-16 18:47:53 -07:00
Yi (Alan) Wang
078e90e8bd Add multiproduct and git repo metadata etl job (#202)
* Add multiproduct and git repo metadata etl job

* implement the dataset availability section

* Extract commit hash use it when querying acl

* Use FileWriter to write records into CSV file

* Remove unnecessary log entries from kafka processor

* Fix the incompatibility between integer repo_id in db and string field in record
2016-08-12 12:26:55 -07:00
Yi Wang
44807f5f7e Fix the incompatibility between integer repo_id in db and string field in record 2016-08-10 17:24:03 -07:00
Yi Wang
bc276274ff Use FileWriter to write records into CSV file 2016-08-10 11:20:31 -07:00
Yi Wang
83834e4e88 Add fetching acl owner info from svn, also change some property names. 2016-08-10 09:11:37 -07:00
Yi Wang
830413e122 Add multiproduct and git repo metadata etl job 2016-08-08 21:28:37 -07:00
Yi Wang
3d3b2a8075 Get kafka job id from applicatoin.conf and then get ref_id and configs from DB 2016-08-03 18:55:07 -07:00
Eric Sun
9d2c803f0c Merge pull request #187 from ericsun2/master
Add datacenter, deploymenttier, cluster info to better describe dataset instance
2016-07-28 17:22:32 -07:00
Eric Sun
f745642212 add datacenter, deploymenttier, cluster to describe dataset instance 2016-07-28 16:38:03 -07:00
Yi Wang
74ed769bab add Oracle dataset metadata ETL job 2016-07-28 14:07:07 -07:00
Yi Wang
6d4706bc62 Ingest Gobblin tracking events into wherehows using Kafka consumer client 2016-07-25 15:03:29 -07:00