52 Commits

Author SHA1 Message Date
Yi (Alan) Wang
e07306b51e Update MetadataChangeEvent, separate privacy compliance from security (#275) 2016-11-11 17:25:41 -08:00
Yi Wang
b4f5e438e2 Add JobExecutionLineageEvent and kafka processor 2016-11-08 19:11:37 -08:00
Yi Wang
664e4072bb Upgrade to play 2.4.8 2016-10-19 17:42:28 -07:00
Yi Wang
3227412339 Login authentication support multiple LDAP servers, add login history 2016-10-13 14:30:43 -07:00
Yi Wang
fcd6cf149e Update MetastoreAuditProcessor to reduce storage, also refactor some code 2016-10-11 11:26:36 -07:00
Yi Wang
5049c847fa Update Kafka consumer actors to reduce memory usage 2016-10-10 14:49:14 -07:00
Yi (Alan) Wang
c9dfb637af Update MetadataChangeEvent APIs according to schema change (#243)
* Update MetadataChangeEvent APIs according to schema change

* Update MultiproductLoad to reflect new Owner types

* Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:33:45 -07:00
jbai
a11e4908dc tracking the GobblinTrackingEvent_autit to get owner information 2016-09-29 15:01:32 -07:00
Yi Wang
ac34eb683f Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery 2016-09-26 15:06:33 -07:00
Yi Wang
1ad2b1528e logback redirect ETL job logs into corresponding files 2016-09-23 16:54:52 -07:00
Yi (Alan) Wang
753de7de7c Merge pull request #233 from alyiwang/master
Update backend APIs to cast SQL results back to Java record then to Json
2016-09-21 08:59:04 -07:00
Eric Sun
89ff794ddf Add api to get dependents of a dataset (#232)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule

* continue process if hive sql parsor encounters error

* reformat etl job log message

* add API to find dataset dependents, such as which hive tables are based on an hdfs path
2016-09-21 08:55:44 -07:00
Yi Wang
be65efb0cc Update backend APIs to cast SQL results back to Java record then serialize to Json reply 2016-09-20 18:56:49 -07:00
Yi Wang(Data Infrastructure)
1171e00097 Add REST proxy for Security API from backend to web 2016-09-19 18:14:10 -07:00
Yi Wang
b136fc6c37 Add MetadataInventoryEvent processor and API 2016-09-15 09:22:42 -07:00
Eric Sun
86bf71499f Reformat the ETL job info message in log. (#222)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule

* continue process if hive sql parsor encounters error

* reformat etl job log message
2016-09-13 14:01:14 -07:00
Yi Wang
5ce5a1425e Add hostname and process_id to wh_etl_job_execution 2016-09-12 16:09:33 -07:00
Yi Wang
5515cbdde9 Add MatadataChangeEvent processor to call seperate APIs 2016-09-06 16:41:50 -07:00
Yi (Alan) Wang
579b8fc9d7 Add metadataChangeEvent APIs to backend-service (#205)
* Add multiproduct and git repo metadata etl job

* Extract commit hash use it when querying acl

* Use FileWriter to write records into CSV file

* Remove unnecessary log entries from kafka processor

* Fix the incompatibility between integer repo_id in db and string field in record

* merge API tables to existing dataset owner and schema field table

* Add confidential and recursive column to dict_dataset_field
2016-08-24 09:10:35 -07:00
Eric Sun
cd4853d0a5 Use ProcessBuilder and redirected log file for HDFS Extract (#198)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule
2016-08-08 14:02:34 -07:00
Yi Wang
c0cfe1f5ca Modify KafkaConsumerMaster to handle more than one kafka config, add error handling 2016-08-04 13:07:19 -07:00
Yi Wang
3d3b2a8075 Get kafka job id from applicatoin.conf and then get ref_id and configs from DB 2016-08-03 18:55:07 -07:00
Eric Sun
8c9cb99ba4 primary_dataset_type for cfg_database 2016-08-01 13:20:04 -07:00
Eric Sun
9d2c803f0c Merge pull request #187 from ericsun2/master
Add datacenter, deploymenttier, cluster info to better describe dataset instance
2016-07-28 17:22:32 -07:00
Eric Sun
f745642212 add datacenter, deploymenttier, cluster to describe dataset instance 2016-07-28 16:38:03 -07:00
Yi Wang
74ed769bab add Oracle dataset metadata ETL job 2016-07-28 14:07:07 -07:00
Yi Wang
7edacc9a9f get kafka config from wh_etl_job_property 2016-07-26 12:16:34 -07:00
Yi Wang
6d4706bc62 Ingest Gobblin tracking events into wherehows using Kafka consumer client 2016-07-25 15:03:29 -07:00
jbai
7a77aba4b7 merge the pull request 165 to master branch 2016-07-21 10:38:36 -07:00
Naga Srinivas Vemuri
97370ed2e1 Query Dataset properties to retrieve datasetUrns 2016-07-21 11:54:47 +05:30
jbai
6af54658d6 merge Fetching dataset watchers via get /dataset/watchers to main branch 2016-06-30 10:20:54 -07:00
jbai
6974ae26ae fix the gradlew check failed issue and make the cluster name input is mandatory 2016-06-15 11:12:20 -07:00
jbai
9705a07ad8 provide the dataset dependency api 2016-06-14 16:17:24 -07:00
jbai
e7885f28db comment out the flow tree builder since UI does not use it anymore 2016-06-13 10:15:32 -07:00
Rafal Kluszczynski
d06bfbfbd0 fix: use correctly directory path from properties when executing etl job 2016-05-30 08:59:05 +02:00
SunZhaonan
0b5c421311 Fix Hive column parser parent path bug 2016-05-19 16:36:30 -07:00
SunZhaonan
9d6a1b2649 Add optional config of ETL job white list 2016-05-12 16:28:23 -07:00
SunZhaonan
31de21ddcf pass parameter through file. 2016-05-03 16:25:56 -07:00
SunZhaonan
c66b00e2f6 Fix dataset insert API bug. Fix load sql bug. 2016-03-28 16:27:43 -07:00
SunZhaonan
a0b7cb9d57 Fix process hanging bug. Add hive field ETL process. 2016-03-16 19:12:21 -07:00
SunZhaonan
aff8f323e4 Scheduler check previous job is finished. Redirect remote outputstream into log. Fix avro parser bugs 2016-03-16 19:09:53 -07:00
SunZhaonan
c4671d2579 Add field comments ETL
Fix API bug of tech_matrix_id

Add key in comment table
2016-03-14 14:23:33 -07:00
SunZhaonan
5e9ae37952 Change to multi processing instead of multi thread. Fix hive ETL bug 2016-02-29 16:37:03 -08:00
SunZhaonan
033a28faee Backend job property API. 'update' change to 'insert on duplicate update' 2016-02-17 16:42:54 -08:00
SunZhaonan
c3da00003e Fix bugs. Reenforce logging. Format jython scripts. Add missing table DDL. 2016-02-03 19:22:18 -08:00
SunZhaonan
05d54b3070 Add Hive metadata ETL process 2015-12-21 12:07:08 -08:00
SunZhaonan
cd44daba5d Merge with master 2015-12-16 15:54:50 -08:00
SunZhaonan
07c46304b5 Fix bug of duplicate field loading. Fix bug of subflow process in azkaban lineage ETL. 2015-12-11 19:46:35 -08:00
Zhen Chen
e21a6b8e75 remove unused variable, change load script and add etl type in backend service 2015-12-11 13:52:23 -08:00
Zhen Chen
ebbf9ec629 add ldap user and group metadata etl 2015-12-10 16:26:57 -08:00