Yi Wang
51f911b400
Map git repo and owners to Oracle/espresso/dali datasets
2016-11-22 10:51:00 -08:00
camelliazhang
2aaafed98c
Merge pull request #274 from camelliazhang/master
...
mark SCM users confirmed by system automatically
2016-11-11 11:53:22 -08:00
Yi (Alan) Wang
06ada42bb9
Merge pull request #272 from alyiwang/master
...
Add JobExecutionLineageEvent and kafka processor
2016-11-11 11:39:09 -08:00
Na Zhang
1962f0a477
mark SCM users confirmed by system automatically
2016-11-11 11:12:28 -08:00
Na Zhang
2facf409b2
update the score table during elastic search dataset update
2016-11-11 10:09:31 -08:00
Yi Wang
b4f5e438e2
Add JobExecutionLineageEvent and kafka processor
2016-11-08 19:11:37 -08:00
Na Zhang
725e689326
add exception handling for DATABASE_SCM_METADATA_ETL and collect info
2016-11-08 17:37:36 -08:00
Na Zhang
217b7d9d09
search ranking improvement with static boosting
2016-11-08 15:18:51 -08:00
Eric Sun
7b36d09b58
Add get_schema_literal_from_url() to fetch schema literal based on schema url ( #268 )
...
* use schema_url_helper to fetch avro schema from hdfs or http location
* trim space
* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
2016-11-07 08:14:45 -08:00
Yi Wang
664e4072bb
Upgrade to play 2.4.8
2016-10-19 17:42:28 -07:00
Na Zhang
dbaf053e76
Add local test properties template for teradata and scm owners ETL
2016-10-19 14:10:29 -07:00
Na Zhang
043dc25e89
Get owners for espresso and oracle, and fix a bug for teradata
2016-10-19 11:13:32 -07:00
Yi Wang
5049c847fa
Update Kafka consumer actors to reduce memory usage
2016-10-10 14:49:14 -07:00
Yi Wang
c9f4f18d9c
Update Azkaban_Execution job to fetch cronExpression in flow scheduling
2016-10-06 13:43:10 -07:00
Yi (Alan) Wang
c9dfb637af
Update MetadataChangeEvent APIs according to schema change ( #243 )
...
* Update MetadataChangeEvent APIs according to schema change
* Update MultiproductLoad to reflect new Owner types
* Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:33:45 -07:00
Yi Wang
0356497124
Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:24:29 -07:00
Yi Wang
8ab5c824b0
Update MultiproductLoad to reflect new Owner types
2016-10-03 18:39:21 -07:00
camelliazhang
fe1e698b8a
remove hive instance hardcode cluster name ( #236 )
2016-09-30 17:15:43 -07:00
Na Zhang
10339690a9
Update HiveTransform and HiveLoad, remove hardcoded cluster name
2016-09-30 16:59:59 -07:00
Eric Sun
fd3b4baef8
avoid loop in LDAP org hierarchy ( #242 )
2016-09-30 16:45:38 -07:00
jerrybai2009
5f0426ea6b
using the dynamic cursor to reduce the memory usage ( #241 )
2016-09-30 16:45:17 -07:00
jbai
a11e4908dc
tracking the GobblinTrackingEvent_autit to get owner information
2016-09-29 15:01:32 -07:00
Yi Wang
ac34eb683f
Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery
2016-09-26 15:06:33 -07:00
Na Zhang
5c76f47313
remove hive instance hardcode cluster name
2016-09-26 15:06:30 -07:00
Yi Wang
1ad2b1528e
logback redirect ETL job logs into corresponding files
2016-09-23 16:54:52 -07:00
jerrybai2009
f7878cdfe4
fix the elastic search index out of gc issue ( #223 )
2016-09-13 16:43:48 -07:00
Eric Sun
86bf71499f
Reformat the ETL job info message in log. ( #222 )
...
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
* continue process if hive sql parsor encounters error
* reformat etl job log message
2016-09-13 14:01:14 -07:00
Yi Wang
33e592da14
Modify HdfsLoad to improve speed
2016-09-09 17:41:13 -07:00
Yi Wang
4c500402fe
Map repo owner fix, change 'main' to 'Producer' and reset sort id
2016-09-02 13:52:00 -07:00
Yi Wang
a809b0ac47
Map repo owner fix to use dataset group mapping
2016-09-01 18:19:41 -07:00
Yi Wang
81f891bfab
Map scm repo owner to dataset owner table
2016-08-30 15:35:28 -07:00
Yi (Alan) Wang
579b8fc9d7
Add metadataChangeEvent APIs to backend-service ( #205 )
...
* Add multiproduct and git repo metadata etl job
* Extract commit hash use it when querying acl
* Use FileWriter to write records into CSV file
* Remove unnecessary log entries from kafka processor
* Fix the incompatibility between integer repo_id in db and string field in record
* merge API tables to existing dataset owner and schema field table
* Add confidential and recursive column to dict_dataset_field
2016-08-24 09:10:35 -07:00
Yi (Alan) Wang
078e90e8bd
Add multiproduct and git repo metadata etl job ( #202 )
...
* Add multiproduct and git repo metadata etl job
* implement the dataset availability section
* Extract commit hash use it when querying acl
* Use FileWriter to write records into CSV file
* Remove unnecessary log entries from kafka processor
* Fix the incompatibility between integer repo_id in db and string field in record
2016-08-12 12:26:55 -07:00
Eric Sun
cd4853d0a5
Use ProcessBuilder and redirected log file for HDFS Extract ( #198 )
...
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
2016-08-08 14:02:34 -07:00
Yi Wang
3d3b2a8075
Get kafka job id from applicatoin.conf and then get ref_id and configs from DB
2016-08-03 18:55:07 -07:00
Yi Wang
dbbdb6e2fb
Modify Oracle metadata ETL job, use Json dumps and remove unnecessary quotes
2016-08-03 18:49:00 -07:00
jerrybai2009
b4a718efd0
Merge pull request #195 from ericsun2/master
...
temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem
2016-08-03 18:15:43 -07:00
jerrybai2009
e7c7175cba
Merge pull request #188 from jerrybai2009/master
...
load the teradata and hadoop data into table dict_dataset_instance
2016-08-03 18:13:06 -07:00
Eric Sun
1cd5872369
temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem; exclude log4j
2016-08-03 15:50:00 -07:00
Eric Sun
6355ccc039
add python module [requests] for simple REST client
2016-07-29 23:10:33 -07:00
jbai
ea1ac0da9f
load the teradata and hadoop data into table dict_dataset_instance
2016-07-29 10:59:33 -07:00
Yi Wang
74ed769bab
add Oracle dataset metadata ETL job
2016-07-28 14:07:07 -07:00
jbai
85bc2db85c
add try catch to catch the exception when reading the config properties
2016-07-26 16:53:30 -07:00
Yi Wang
7edacc9a9f
get kafka config from wh_etl_job_property
2016-07-26 12:16:34 -07:00
Yi Wang
6d4706bc62
Ingest Gobblin tracking events into wherehows using Kafka consumer client
2016-07-25 15:03:29 -07:00
jbai
9fb5b09bd2
update dependency property name and fix the duplicated key issue when update cfg_object_name_map table
2016-07-20 19:07:16 -07:00
jbai
f3c299480f
update the column names from schema to schema_text and view_expanded_text to ddl_text
2016-07-20 18:01:25 -07:00
jbai
33b05cde4b
tracking the dalids schema and expanded text by versions
2016-07-20 15:59:11 -07:00
jbai
9166db7563
update the dict_dataset_instance data loading sql since table key changed
2016-06-29 18:00:10 -07:00
Eric Sun
1573fdb212
rename hive dependency to hive_exec; reuse metadata-etl/extralibs; test travis ci;
2016-06-28 18:03:02 -07:00