camelliazhang
2aaafed98c
Merge pull request #274 from camelliazhang/master
...
mark SCM users confirmed by system automatically
2016-11-11 11:53:22 -08:00
Yi (Alan) Wang
06ada42bb9
Merge pull request #272 from alyiwang/master
...
Add JobExecutionLineageEvent and kafka processor
2016-11-11 11:39:09 -08:00
Na Zhang
1962f0a477
mark SCM users confirmed by system automatically
2016-11-11 11:12:28 -08:00
Na Zhang
2facf409b2
update the score table during elastic search dataset update
2016-11-11 10:09:31 -08:00
Yi Wang
b4f5e438e2
Add JobExecutionLineageEvent and kafka processor
2016-11-08 19:11:37 -08:00
Na Zhang
725e689326
add exception handling for DATABASE_SCM_METADATA_ETL and collect info
2016-11-08 17:37:36 -08:00
Na Zhang
217b7d9d09
search ranking improvement with static boosting
2016-11-08 15:18:51 -08:00
Eric Sun
7b36d09b58
Add get_schema_literal_from_url() to fetch schema literal based on schema url ( #268 )
...
* use schema_url_helper to fetch avro schema from hdfs or http location
* trim space
* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
2016-11-07 08:14:45 -08:00
Yi Wang
664e4072bb
Upgrade to play 2.4.8
2016-10-19 17:42:28 -07:00
Na Zhang
dbaf053e76
Add local test properties template for teradata and scm owners ETL
2016-10-19 14:10:29 -07:00
Na Zhang
043dc25e89
Get owners for espresso and oracle, and fix a bug for teradata
2016-10-19 11:13:32 -07:00
Yi Wang
5049c847fa
Update Kafka consumer actors to reduce memory usage
2016-10-10 14:49:14 -07:00
Yi Wang
c9f4f18d9c
Update Azkaban_Execution job to fetch cronExpression in flow scheduling
2016-10-06 13:43:10 -07:00
Yi (Alan) Wang
c9dfb637af
Update MetadataChangeEvent APIs according to schema change ( #243 )
...
* Update MetadataChangeEvent APIs according to schema change
* Update MultiproductLoad to reflect new Owner types
* Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:33:45 -07:00
Yi Wang
0356497124
Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:24:29 -07:00
Yi Wang
8ab5c824b0
Update MultiproductLoad to reflect new Owner types
2016-10-03 18:39:21 -07:00
camelliazhang
fe1e698b8a
remove hive instance hardcode cluster name ( #236 )
2016-09-30 17:15:43 -07:00
Na Zhang
10339690a9
Update HiveTransform and HiveLoad, remove hardcoded cluster name
2016-09-30 16:59:59 -07:00
Eric Sun
fd3b4baef8
avoid loop in LDAP org hierarchy ( #242 )
2016-09-30 16:45:38 -07:00
jerrybai2009
5f0426ea6b
using the dynamic cursor to reduce the memory usage ( #241 )
2016-09-30 16:45:17 -07:00
jbai
a11e4908dc
tracking the GobblinTrackingEvent_autit to get owner information
2016-09-29 15:01:32 -07:00
Yi Wang
ac34eb683f
Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery
2016-09-26 15:06:33 -07:00
Na Zhang
5c76f47313
remove hive instance hardcode cluster name
2016-09-26 15:06:30 -07:00
Yi Wang
1ad2b1528e
logback redirect ETL job logs into corresponding files
2016-09-23 16:54:52 -07:00
jerrybai2009
f7878cdfe4
fix the elastic search index out of gc issue ( #223 )
2016-09-13 16:43:48 -07:00
Eric Sun
86bf71499f
Reformat the ETL job info message in log. ( #222 )
...
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
* continue process if hive sql parsor encounters error
* reformat etl job log message
2016-09-13 14:01:14 -07:00
Yi Wang
33e592da14
Modify HdfsLoad to improve speed
2016-09-09 17:41:13 -07:00
Yi Wang
4c500402fe
Map repo owner fix, change 'main' to 'Producer' and reset sort id
2016-09-02 13:52:00 -07:00
Yi Wang
a809b0ac47
Map repo owner fix to use dataset group mapping
2016-09-01 18:19:41 -07:00
Yi Wang
81f891bfab
Map scm repo owner to dataset owner table
2016-08-30 15:35:28 -07:00
Yi (Alan) Wang
579b8fc9d7
Add metadataChangeEvent APIs to backend-service ( #205 )
...
* Add multiproduct and git repo metadata etl job
* Extract commit hash use it when querying acl
* Use FileWriter to write records into CSV file
* Remove unnecessary log entries from kafka processor
* Fix the incompatibility between integer repo_id in db and string field in record
* merge API tables to existing dataset owner and schema field table
* Add confidential and recursive column to dict_dataset_field
2016-08-24 09:10:35 -07:00
Yi (Alan) Wang
078e90e8bd
Add multiproduct and git repo metadata etl job ( #202 )
...
* Add multiproduct and git repo metadata etl job
* implement the dataset availability section
* Extract commit hash use it when querying acl
* Use FileWriter to write records into CSV file
* Remove unnecessary log entries from kafka processor
* Fix the incompatibility between integer repo_id in db and string field in record
2016-08-12 12:26:55 -07:00
Eric Sun
cd4853d0a5
Use ProcessBuilder and redirected log file for HDFS Extract ( #198 )
...
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
2016-08-08 14:02:34 -07:00
Yi Wang
3d3b2a8075
Get kafka job id from applicatoin.conf and then get ref_id and configs from DB
2016-08-03 18:55:07 -07:00
Yi Wang
dbbdb6e2fb
Modify Oracle metadata ETL job, use Json dumps and remove unnecessary quotes
2016-08-03 18:49:00 -07:00
jerrybai2009
b4a718efd0
Merge pull request #195 from ericsun2/master
...
temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem
2016-08-03 18:15:43 -07:00
jerrybai2009
e7c7175cba
Merge pull request #188 from jerrybai2009/master
...
load the teradata and hadoop data into table dict_dataset_instance
2016-08-03 18:13:06 -07:00
Eric Sun
1cd5872369
temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem; exclude log4j
2016-08-03 15:50:00 -07:00
Eric Sun
6355ccc039
add python module [requests] for simple REST client
2016-07-29 23:10:33 -07:00
jbai
ea1ac0da9f
load the teradata and hadoop data into table dict_dataset_instance
2016-07-29 10:59:33 -07:00
Yi Wang
74ed769bab
add Oracle dataset metadata ETL job
2016-07-28 14:07:07 -07:00
jbai
85bc2db85c
add try catch to catch the exception when reading the config properties
2016-07-26 16:53:30 -07:00
Yi Wang
7edacc9a9f
get kafka config from wh_etl_job_property
2016-07-26 12:16:34 -07:00
Yi Wang
6d4706bc62
Ingest Gobblin tracking events into wherehows using Kafka consumer client
2016-07-25 15:03:29 -07:00
jbai
9fb5b09bd2
update dependency property name and fix the duplicated key issue when update cfg_object_name_map table
2016-07-20 19:07:16 -07:00
jbai
f3c299480f
update the column names from schema to schema_text and view_expanded_text to ddl_text
2016-07-20 18:01:25 -07:00
jbai
33b05cde4b
tracking the dalids schema and expanded text by versions
2016-07-20 15:59:11 -07:00
jbai
9166db7563
update the dict_dataset_instance data loading sql since table key changed
2016-06-29 18:00:10 -07:00
Eric Sun
1573fdb212
rename hive dependency to hive_exec; reuse metadata-etl/extralibs; test travis ci;
2016-06-28 18:03:02 -07:00
Eric Sun
5348d44a77
Force object (db.table) names extracted by the getViewDependency() API to lower cases
...
object (db.table) extracted by the getViewDependency() API can contain the camel cases string, this can potentially cause mismatch in the underlying RDBMS.
2016-06-27 16:43:04 -07:00