Yi Wang
7d6bb9fac9
Add KAFKA ETL job to fetch topics from Nuage
2017-07-10 09:53:27 -07:00
Yi Wang
a9335bc49a
Add VOLDEMORT ETL job to fetch datasets from Nuage
2017-07-10 09:53:27 -07:00
Yi Wang
cf49ae375c
ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage
2017-07-10 09:53:27 -07:00
jbai
e8b21a17df
implement the Appworx log parser
2017-07-10 09:53:24 -07:00
jbai
eb93d67b64
support Appworx flow and job definition and execution
2017-07-10 09:53:24 -07:00
Zhen Chen
b36f774feb
add dali view owner etl
2017-07-10 09:53:23 -07:00
camelliazhang
724f754f03
clean and refactor elastic serach ETL job ( #300 )
2016-12-14 21:22:30 -08:00
Eric Sun
7b36d09b58
Add get_schema_literal_from_url() to fetch schema literal based on schema url ( #268 )
...
* use schema_url_helper to fetch avro schema from hdfs or http location
* trim space
* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
2016-11-07 08:14:45 -08:00
Na Zhang
043dc25e89
Get owners for espresso and oracle, and fix a bug for teradata
2016-10-19 11:13:32 -07:00
Eric Sun
89ff794ddf
Add api to get dependents of a dataset ( #232 )
...
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
* continue process if hive sql parsor encounters error
* reformat etl job log message
* add API to find dataset dependents, such as which hive tables are based on an hdfs path
2016-09-21 08:55:44 -07:00
Eric Sun
86bf71499f
Reformat the ETL job info message in log. ( #222 )
...
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
* continue process if hive sql parsor encounters error
* reformat etl job log message
2016-09-13 14:01:14 -07:00
Yi (Alan) Wang
078e90e8bd
Add multiproduct and git repo metadata etl job ( #202 )
...
* Add multiproduct and git repo metadata etl job
* implement the dataset availability section
* Extract commit hash use it when querying acl
* Use FileWriter to write records into CSV file
* Remove unnecessary log entries from kafka processor
* Fix the incompatibility between integer repo_id in db and string field in record
2016-08-12 12:26:55 -07:00
Yi Wang
74ed769bab
add Oracle dataset metadata ETL job
2016-07-28 14:07:07 -07:00
jbai
e5880cf81a
fix the merge conflict
2016-06-07 11:31:00 -07:00
jbai
af976c3d5e
Dali Metadata integration - combine dali versions into one node
2016-06-02 18:29:44 -07:00
SunZhaonan
bec1c5cee0
Add local mode for hdfs extract
2016-05-31 14:44:32 -07:00
jbai
a2e42d60f3
add the elasticsearch index build and update file
2016-05-23 17:58:37 -07:00
Zhaonan Sun
a7187a42bf
Merge pull request #108 from SunZhaonan/master
...
Innodb engine DDL. Add config for timeout and load sample.
2016-04-06 15:07:44 -07:00
Arkadiusz Osinski
4d9f1681f0
missing letter in property name hive.metastore.username
2016-04-06 08:46:33 +02:00
SunZhaonan
b202832741
Innodb engine DDL. Add config for timeout and load sample.
2016-04-05 12:43:02 -07:00
SunZhaonan
4a4894a192
Use Kerberos login
2016-03-17 12:31:58 -07:00
SunZhaonan
6b024196cd
Fix Hive extract disorder bug. Add Hive database optional whitelist params
2016-03-16 19:09:52 -07:00
SunZhaonan
4574e89de9
Fix bug of sample data schema inconsistant. Add clean up. Parameterize number of Actor
2016-02-22 17:40:07 -08:00
SunZhaonan
dfeefba213
Parameterize dataset source derived process.
2016-02-17 16:06:24 -08:00
SunZhaonan
de4d4cd0c1
Add documentation on important Constants and Classes
2016-02-12 16:57:12 -08:00
SunZhaonan
cd44daba5d
Merge with master
2015-12-16 15:54:50 -08:00
SunZhaonan
07c46304b5
Fix bug of duplicate field loading. Fix bug of subflow process in azkaban lineage ETL.
2015-12-11 19:46:35 -08:00
Zhen Chen
af04ff6efc
add git file commit history etl
2015-12-11 11:02:29 -08:00
Zhen Chen
ebbf9ec629
add ldap user and group metadata etl
2015-12-10 16:26:57 -08:00
Zhen Chen
5a08134b8d
add dataset owner metadata etl
2015-12-07 15:17:01 -08:00
SunZhaonan
d5c3d87d00
Initial commit
2015-11-19 14:39:21 -08:00