117 Commits

Author SHA1 Message Date
camelliazhang
2aaafed98c Merge pull request #274 from camelliazhang/master
mark SCM users confirmed by system automatically
2016-11-11 11:53:22 -08:00
Yi (Alan) Wang
06ada42bb9 Merge pull request #272 from alyiwang/master
Add JobExecutionLineageEvent and kafka processor
2016-11-11 11:39:09 -08:00
Na Zhang
1962f0a477 mark SCM users confirmed by system automatically 2016-11-11 11:12:28 -08:00
Na Zhang
2facf409b2 update the score table during elastic search dataset update 2016-11-11 10:09:31 -08:00
Yi Wang
b4f5e438e2 Add JobExecutionLineageEvent and kafka processor 2016-11-08 19:11:37 -08:00
Na Zhang
725e689326 add exception handling for DATABASE_SCM_METADATA_ETL and collect info 2016-11-08 17:37:36 -08:00
Na Zhang
217b7d9d09 search ranking improvement with static boosting 2016-11-08 15:18:51 -08:00
Eric Sun
7b36d09b58 Add get_schema_literal_from_url() to fetch schema literal based on schema url (#268)
* use schema_url_helper to fetch avro schema from hdfs or http location

* trim space

* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
2016-11-07 08:14:45 -08:00
Yi Wang
664e4072bb Upgrade to play 2.4.8 2016-10-19 17:42:28 -07:00
Na Zhang
dbaf053e76 Add local test properties template for teradata and scm owners ETL 2016-10-19 14:10:29 -07:00
Na Zhang
043dc25e89 Get owners for espresso and oracle, and fix a bug for teradata 2016-10-19 11:13:32 -07:00
Yi Wang
5049c847fa Update Kafka consumer actors to reduce memory usage 2016-10-10 14:49:14 -07:00
Yi Wang
c9f4f18d9c Update Azkaban_Execution job to fetch cronExpression in flow scheduling 2016-10-06 13:43:10 -07:00
Yi (Alan) Wang
c9dfb637af Update MetadataChangeEvent APIs according to schema change (#243)
* Update MetadataChangeEvent APIs according to schema change

* Update MultiproductLoad to reflect new Owner types

* Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:33:45 -07:00
Yi Wang
0356497124 Add comments for Owner_type precedence (priority) and compliance 2016-10-06 13:24:29 -07:00
Yi Wang
8ab5c824b0 Update MultiproductLoad to reflect new Owner types 2016-10-03 18:39:21 -07:00
camelliazhang
fe1e698b8a remove hive instance hardcode cluster name (#236) 2016-09-30 17:15:43 -07:00
Na Zhang
10339690a9 Update HiveTransform and HiveLoad, remove hardcoded cluster name 2016-09-30 16:59:59 -07:00
Eric Sun
fd3b4baef8 avoid loop in LDAP org hierarchy (#242) 2016-09-30 16:45:38 -07:00
jerrybai2009
5f0426ea6b using the dynamic cursor to reduce the memory usage (#241) 2016-09-30 16:45:17 -07:00
jbai
a11e4908dc tracking the GobblinTrackingEvent_autit to get owner information 2016-09-29 15:01:32 -07:00
Yi Wang
ac34eb683f Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery 2016-09-26 15:06:33 -07:00
Na Zhang
5c76f47313 remove hive instance hardcode cluster name 2016-09-26 15:06:30 -07:00
Yi Wang
1ad2b1528e logback redirect ETL job logs into corresponding files 2016-09-23 16:54:52 -07:00
jerrybai2009
f7878cdfe4 fix the elastic search index out of gc issue (#223) 2016-09-13 16:43:48 -07:00
Eric Sun
86bf71499f Reformat the ETL job info message in log. (#222)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule

* continue process if hive sql parsor encounters error

* reformat etl job log message
2016-09-13 14:01:14 -07:00
Yi Wang
33e592da14 Modify HdfsLoad to improve speed 2016-09-09 17:41:13 -07:00
Yi Wang
4c500402fe Map repo owner fix, change 'main' to 'Producer' and reset sort id 2016-09-02 13:52:00 -07:00
Yi Wang
a809b0ac47 Map repo owner fix to use dataset group mapping 2016-09-01 18:19:41 -07:00
Yi Wang
81f891bfab Map scm repo owner to dataset owner table 2016-08-30 15:35:28 -07:00
Yi (Alan) Wang
579b8fc9d7 Add metadataChangeEvent APIs to backend-service (#205)
* Add multiproduct and git repo metadata etl job

* Extract commit hash use it when querying acl

* Use FileWriter to write records into CSV file

* Remove unnecessary log entries from kafka processor

* Fix the incompatibility between integer repo_id in db and string field in record

* merge API tables to existing dataset owner and schema field table

* Add confidential and recursive column to dict_dataset_field
2016-08-24 09:10:35 -07:00
Yi (Alan) Wang
078e90e8bd Add multiproduct and git repo metadata etl job (#202)
* Add multiproduct and git repo metadata etl job

* implement the dataset availability section

* Extract commit hash use it when querying acl

* Use FileWriter to write records into CSV file

* Remove unnecessary log entries from kafka processor

* Fix the incompatibility between integer repo_id in db and string field in record
2016-08-12 12:26:55 -07:00
Eric Sun
cd4853d0a5 Use ProcessBuilder and redirected log file for HDFS Extract (#198)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule
2016-08-08 14:02:34 -07:00
Yi Wang
3d3b2a8075 Get kafka job id from applicatoin.conf and then get ref_id and configs from DB 2016-08-03 18:55:07 -07:00
Yi Wang
dbbdb6e2fb Modify Oracle metadata ETL job, use Json dumps and remove unnecessary quotes 2016-08-03 18:49:00 -07:00
jerrybai2009
b4a718efd0 Merge pull request #195 from ericsun2/master
temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem
2016-08-03 18:15:43 -07:00
jerrybai2009
e7c7175cba Merge pull request #188 from jerrybai2009/master
load the teradata and hadoop data into table dict_dataset_instance
2016-08-03 18:13:06 -07:00
Eric Sun
1cd5872369 temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem; exclude log4j 2016-08-03 15:50:00 -07:00
Eric Sun
6355ccc039 add python module [requests] for simple REST client 2016-07-29 23:10:33 -07:00
jbai
ea1ac0da9f load the teradata and hadoop data into table dict_dataset_instance 2016-07-29 10:59:33 -07:00
Yi Wang
74ed769bab add Oracle dataset metadata ETL job 2016-07-28 14:07:07 -07:00
jbai
85bc2db85c add try catch to catch the exception when reading the config properties 2016-07-26 16:53:30 -07:00
Yi Wang
7edacc9a9f get kafka config from wh_etl_job_property 2016-07-26 12:16:34 -07:00
Yi Wang
6d4706bc62 Ingest Gobblin tracking events into wherehows using Kafka consumer client 2016-07-25 15:03:29 -07:00
jbai
9fb5b09bd2 update dependency property name and fix the duplicated key issue when update cfg_object_name_map table 2016-07-20 19:07:16 -07:00
jbai
f3c299480f update the column names from schema to schema_text and view_expanded_text to ddl_text 2016-07-20 18:01:25 -07:00
jbai
33b05cde4b tracking the dalids schema and expanded text by versions 2016-07-20 15:59:11 -07:00
jbai
9166db7563 update the dict_dataset_instance data loading sql since table key changed 2016-06-29 18:00:10 -07:00
Eric Sun
1573fdb212 rename hive dependency to hive_exec; reuse metadata-etl/extralibs; test travis ci; 2016-06-28 18:03:02 -07:00
Eric Sun
5348d44a77 Force object (db.table) names extracted by the getViewDependency() API to lower cases
object (db.table) extracted by the getViewDependency() API can contain the camel cases string, this can potentially cause mismatch in the underlying RDBMS.
2016-06-27 16:43:04 -07:00