117 Commits

Author SHA1 Message Date
Yi (Alan) Wang
b1f237393e
Modify list datasets API to return pagination info (#936) 2018-01-18 08:31:31 -08:00
Mars Lan
65cf843236
Fix a few low-hanging findbugs issues (#862) 2017-11-15 17:50:56 -08:00
Shridhar Sattur
f1b61d6773 Migrating to apache commons configuration 1.10 for parsing ETL jobs configurations. (#851) 2017-11-13 16:59:12 -08:00
Mars Lan
bf16a411a8
Make the ETL java command configurable (#845) 2017-11-07 15:54:05 -08:00
na zhang
54237ff5ca support elasticsearch auto re-index with zero downtime via alias switch 2017-10-25 15:50:03 -07:00
Mars Lan
32d565a3ff Allow configuring the number of workers for each kafka consumer job. (#810) 2017-10-20 11:18:35 -07:00
Na Zhang
eb76971c54 modify elastic search ETL to adapt to newly set up es server and versions 2017-10-16 16:05:39 -07:00
Shridhar Sattur
2a16298639 Removed all manual logger instantiations with lombok's @Slf4j. (#791) 2017-10-11 15:25:47 -07:00
Mars Lan
98a3a0ea4f Move various constants into Constant class (#780) 2017-10-03 14:52:31 -07:00
Yi (Alan) Wang
fce70ebe25 Update MCE processor (#764) 2017-09-25 17:22:04 -07:00
Mars Lan
9f7341f542 Clean up the log4j vs slf4j mess and consolidate the dependency resolution into a gradle script (#745) 2017-09-13 15:37:29 -07:00
Yi (Alan) Wang
f015d66fd2 Remove obsolete kafka code in wherehows-common (#739) 2017-09-11 18:13:31 -07:00
Yi (Alan) Wang
d2a3fe58db Add schema and generated java to data model, refactor Gobblin audit processor (#732) 2017-09-11 15:26:06 -07:00
Yi (Alan) Wang
d1e53c765f Rewrite Kafka client master and worker (#731) 2017-09-07 17:48:39 -07:00
Yi (Alan) Wang
e0e2acf6bf Move JobsUtil to WH common, fix some tests (#730) 2017-09-06 11:45:52 -07:00
Mars Lan
862b893d6e Write PID to the path specified by pidfile.path system property. (#727) 2017-09-05 14:05:13 -07:00
Yi (Alan) Wang
8084e35303 Modify HIVE extract, disable schema fetching from HDFS, add DB reconnect, refactor code (#718) 2017-08-30 17:34:31 -07:00
Mars Lan
bf5448d561 Replace db.id & app.id property keys with the existing job.ref.id key (#695) 2017-08-22 17:39:10 -07:00
wenhuaOpenx
f7ec09e19a Add etl job to extract/load druid metadata (#680)
* test

* test

* add druid etl code

* remove comments

* remove comments

* remove test doc"

* add job template for DRUID_METADATA_ETL

* add druid metadata etl configs to local_test.properties.template

* refactor logger

* remove comments

* fix typos

* add unit test for druid metadata etl job

* refactor unit test code

* import testgn package

* import new package

* reformat the druid etl code based on LinkedIn code style

* add README for druid metadata etl

* add README for druid metadata etl

* add README for druid metadata etl
2017-08-21 16:34:49 -07:00
Yi (Alan) Wang
28b83b8e7b Add BaseJob in wherehows-common, make ETLjob extends from it (#681) 2017-08-16 21:24:38 -07:00
Na Zhang
ca34cd920d ump metric ETL 2017-08-16 16:43:46 -07:00
Mars Lan
4b4cae2148 Add base class for all pure jython ETLs (#663) 2017-08-10 17:40:19 -07:00
hzhang2
d22b975755 fix a DB insert issue, reporting missing a column (#662) 2017-08-10 17:36:10 -07:00
Mars Lan
ac25412cb7 Move java plugin back to applicable subprojects, instead of applying it broadly (#646) 2017-08-08 17:12:57 -07:00
Yi (Alan) Wang
70ed8c6d1a Fix DBstring Util to handle Java Boolean type (#623) 2017-07-26 21:14:08 -07:00
Yi (Alan) Wang
6a8fe66ef9 Add active, deprecated flag to dataset, modify ETL to update is_active (#604) 2017-07-19 17:07:28 -07:00
Mars Lan
16a95e6ed5 Clean up LI-specific constants and code. (#588) 2017-07-10 13:44:35 -07:00
Mars Lan
c7b6fd1688 Move TMS restli related code out of open source. (#587)
Add skeleton for a generalized DAO framework.
2017-07-10 13:44:35 -07:00
Mars Lan
bb5f483be9 Clean up comments and DDL for obsolete tables. (#586) 2017-07-10 13:44:35 -07:00
Mars Lan
a950cdbc1a Fix the bug where MySQL credential isn't properly passed to DatabaseWriter. (#585) 2017-07-10 13:44:35 -07:00
Mars Lan
deb98480a3 Completely retire wh_property table and associated codes. (#583) 2017-07-10 13:44:35 -07:00
Mars Lan
33f49c5f55 Fix more places where connection URLs are manually constructed. (#579) 2017-07-10 13:44:35 -07:00
Mars Lan
770ac152e9 Refactor tree jobs to use job files. (#576) 2017-07-10 13:44:35 -07:00
Mars Lan
9dca733d76 Move elastic-serach related props from wh_property table to job files. (#574)
This will break FlowTreeBuilder & DatasetTreeBuilder, both are not being used anyway.
2017-07-10 13:44:34 -07:00
Mars Lan
53a30d5a77 Major refactoring of ETL scheduling & configuration (#542)
* Major refactoring
- Move job-spcific properties from wh_etl_job_property table to .job files
- Use the job file name instead of numeric IDs to identify ETL jobs
- Use reflection to create ETL job class at run time instead of relying hard-coded enums
- Drop ETL job-related APIs as they're no longer needed
- Drop wh_etl_job, wh_etl_job_property, wh_etl_job_execution tables
- Add wh_etl_schedule & wh_etl_history tables
2017-07-10 13:44:33 -07:00
Yi (Alan) Wang
675dadd374 Kafka ETL to fetch queuing pipeline, also add topic blacklist (#509) 2017-07-10 13:42:57 -07:00
Mars Lan
b4fec37f61 Fix Kerberos authentication so that HIVE_DATASET_METADATA_ETL jobs can be run from non-grid cluster. (#482) 2017-07-10 13:42:56 -07:00
Mars Lan
cf4e157813 Read master key from environmental variable instead of from local fil… (#417)
* Read master key from environmental variable instead of from local file. This would allow us to pass it in via cfg2 ultimiately.

* Move the env var name to Constant.java
2017-07-10 09:55:16 -07:00
na zhang
5f6fffde57 Restli Client for populating espresso/oracle datasets and schema metadata (#349)
* add dali view owner etl

* add idpc ui

* add the internal flag to switch linkedin internal features

* add idpc ui

* add the internal flag to switch linkedin internal features

* DSS-3495, implement the UI for IDPC JIRA part

* DSS-4076, update the metric view since data model changed

* DSS-4092, add metric into search and advanced search

* update metric database table name and fix the refId and refIdType issue

* remove duplicated idpc entry and javascript log

* Add fetch_owner hive script

* support Appworx flow and job definition and execution

* implement the Appworx log parser

* bring the script finder back

* update the script finder source table name

* add the flow_path into lineage and extract the script info

* fix the appwors flow job and lineage extract issues

* bring the git location back to lineage script node

* sort the script finder lineage info by type

* bring the script info back for lineage job tab

* fix the master branch merge issue

* fix the oracle unixtime calculating issue

* shorten the flow&job extract interval time to 2 hours instead of 1 day

* shorten the appworx refresh time

* add license header; include RUNNING chains from SO_JOB_QUEUE for Appworx

* implement the list view for metrics

* Modify /dataset POST method to perform INSERT or UPDATE of the DatasetRecord

* apply the list view css change to metric

* upgrade idpc and script finder to ember 2.6.2

* metadata dashboard confidential field data collecting

* implement the confidential fields of metadata dashboard

* metadata dashboard dataset description collecting

* update the final table name

* update the final table name for other load function

* exchange the source target of cfg_object_name_map

* implement the description tab for metadata dashboard

* add the load dataset and field comments function

* implemented the bar and pie chart for description

* implement the ownership section for metadata dashboard

* fix the issue that appworx lineage job running too long

* add the table job attempt source code

* implemented the idpc compliance section

* Security Compliance Tab UI (#246)

* Add back WhereHows internal tracking (#251)

* DSS-5178 DSS-5277: Implements Compliance and Confidential Spec
Adds 'logs/' to ignored files

Updates EmberSelectorComponent to handle a list of string options or list of options with value and label, flags the currently selected option, and bubble change actions with 'selectionDidChange' action

DSS-5178: Removes previous updates to search.js: moving jQuery + DOM heavy imperative implementation to Ember component

DSS-5178: Adds templates and components DropRegion and DraggableItem

DSS-5178: Adds getSecuritySpec action and compliance types to Dataset controller, cleans up Datasets route and removes inline securitySpec fetch from route

DSS-5178: Updates templates for compliance spec

DSS-5178: Adds compliance component and updates template

Adds .DS_Store to gitignore

DSS-5277: Adds dataset-confidential component to DOM, Creates DatasetConfidential component, refactors out data handling from component

DSS-5277: Moves data fetching to Dataset Route model and set model data on controller, Adds template for confidential spec component

DSS-5178: Moves view related complianceTypes to component

DSS-5277 DSS-5178: Adds styling for tab content

* DSS-5277 DSS-5178: Adds support for modifying compliancePurgeEntities that don't currently have identifierFields persisted on the remote, PurgeableEntityFieldIdentifierType enum is sourced in client

* DSS-5178 DSS-5277: Adds dataType field to UI for schema field name search result. Refactors processSchema into parseSchema to get fields and types

* DSS-5277 Fixes bug with missing params property on controller depending on route entry point

* DSS-5543: Fixes rendering of datasets in detailview navigating from sidebar/ treeview (#259)

* DSS-5677: Changes component from block syntax to inline. Add property for creating a new PrivacyCompliancePolicy and SecuritySpecification for statasets without either

* DSS-5677: Adds ability to create a new PrivacyCompliancePolicy and SecuritySpecification from the client UI. Also fixes issue with matching fields and data type properties on schema with inconsistent shapes

* DSS-5677: Add create banner for datasets without Privacy policy or Security specification

* DSS-5677: Updates UI to more closely match spec, changes search input behaviour to filter from search

* ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage

* Update Nuage load process, fix owner subtype and source

* Add VOLDEMORT ETL job to fetch datasets from Nuage

* Add KAFKA ETL job to fetch topics from Nuage

* skip KAFKA topics starting with 'test' when fetching from Nuage

* Merges front-end changes from master -> DSS-5178 DSS-5577 DSS-5677 DSS-5277 DSS-5677

* DSS-5784: Fixes issue with AdvancedSearch and ScriptFinder URL queries being RFC-3986 incompliant

* ScriptFinder Controller add URL decoding for Json fields (#290)

* DSS-5888 Adds configuration support for Piwik environment tracking. Setting the 'tracking.piwik.siteid' to a value will get rendered in the template and consumed by the tracking initializer

* DSS-5888 DSS-5875 Adds tracking for users. Adds client side tracking for keyword and init for Piwik script module

* Fixes mismatch with compliance api property name: privacyCompliancePolicy != privacyCompliance

* DSS-5888 Fixes tracking userId for noscript tag

* DSS-5865 Removes spinner on metadata/dashboard/idpc-compliance fail

* DSS-6177 Removed unused links in Metric Detail page

* Update Appworx Execution and Lineage jobs (#321)

* DSS-6197: Adds default value for classification property on security specification if not defined

* DSS-6198: Fixes issue with nested fields not getting rendered in the schema for compliance and confidential tabs

* DSS-6018 Adds ui feature to track feedback on user search results relevance using a up/down voting mechanism

* Make unit tests buildable again for backend and web (#325)

* Make unit tests buildable again for backend and web.

* Add back fest dependency so the tests can stay more of less the same as before.

* Generate code coverage reports (#334)

* Add playCoverage task to run code coverage using JaCoco for backend and web.

* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.

* Add data platform filter for dashboard APIs (#322)

* Add data platform filter for dashboard APIs

* Add exception handling for Espresso and Kafka ETL job

* restli client to populate espresso and oracle metadata
2017-07-10 09:54:08 -07:00
Mars Lan
e36a40cd65 Generate code coverage reports (#334)
* Add playCoverage task to run code coverage using JaCoco for backend and web.

* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.
2017-07-10 09:53:28 -07:00
Yi Wang
7d6bb9fac9 Add KAFKA ETL job to fetch topics from Nuage 2017-07-10 09:53:27 -07:00
Yi Wang
a9335bc49a Add VOLDEMORT ETL job to fetch datasets from Nuage 2017-07-10 09:53:27 -07:00
Yi Wang
cf49ae375c ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage 2017-07-10 09:53:27 -07:00
jbai
e8b21a17df implement the Appworx log parser 2017-07-10 09:53:24 -07:00
jbai
eb93d67b64 support Appworx flow and job definition and execution 2017-07-10 09:53:24 -07:00
Zhen Chen
b36f774feb add dali view owner etl 2017-07-10 09:53:23 -07:00
Yi (Alan) Wang
488929ad93 Modify Confidential info schema to add identifierField and logicalType (#385) 2017-03-30 21:29:20 -07:00
Yi Wang
adba532474 Modify compliance purge entity record to support logical type and is_subject 2017-03-23 22:08:54 -07:00
Yi Wang
6c57e30240 Fix bugs found by AppCheck in issue #328 2017-02-24 11:08:18 -08:00
camelliazhang
724f754f03 clean and refactor elastic serach ETL job (#300) 2016-12-14 21:22:30 -08:00