28 Commits

Author SHA1 Message Date
richardxin
53708482d1 Hive table level white/black listing (#1046) 2018-03-23 17:11:40 -07:00
richardxin
978cbab4fa [Issue 1017] fix bugs to handle dataset renaming and deletion properly - in Hive/Oracle/Teradata/Hdfs (#1040) 2018-03-15 13:31:30 -07:00
Viv
90488419cc Changing from int to str (#904) 2017-12-07 21:44:58 -08:00
Na Zhang
1479e253be etl enhancement switch alias only when new index is successfully created and built 2017-11-09 16:43:01 -08:00
na zhang
99d06aacca need to use new index name in reindex 2017-10-29 22:34:40 -07:00
na zhang
54237ff5ca support elasticsearch auto re-index with zero downtime via alias switch 2017-10-25 15:50:03 -07:00
Na Zhang
f3eb5d2afc fix a bug in ETL 2017-10-19 11:52:47 -07:00
Na Zhang
eb76971c54 modify elastic search ETL to adapt to newly set up es server and versions 2017-10-16 16:05:39 -07:00
Yi (Alan) Wang
f31664d5cf Modify HIVE ETL output file dir (#728) 2017-09-05 17:01:21 -07:00
Yi (Alan) Wang
8084e35303 Modify HIVE extract, disable schema fetching from HDFS, add DB reconnect, refactor code (#718) 2017-08-30 17:34:31 -07:00
Mars Lan
bf5448d561 Replace db.id & app.id property keys with the existing job.ref.id key (#695) 2017-08-22 17:39:10 -07:00
wenhuaOpenx
f7ec09e19a Add etl job to extract/load druid metadata (#680)
* test

* test

* add druid etl code

* remove comments

* remove comments

* remove test doc"

* add job template for DRUID_METADATA_ETL

* add druid metadata etl configs to local_test.properties.template

* refactor logger

* remove comments

* fix typos

* add unit test for druid metadata etl job

* refactor unit test code

* import testgn package

* import new package

* reformat the druid etl code based on LinkedIn code style

* add README for druid metadata etl

* add README for druid metadata etl

* add README for druid metadata etl
2017-08-21 16:34:49 -07:00
Mars Lan
ed5d662111 Remove outdated/internal ETLs (#669) 2017-08-14 16:00:45 -07:00
Yi (Alan) Wang
761210b645 Add Util function to parse boolean parameter for ETL jobs (#620) 2017-07-26 13:25:14 -07:00
Yi (Alan) Wang
5e0d2c01cf Fix Oracle ETL job, rewrite data sampling (#612) 2017-07-24 11:11:03 -07:00
Yi (Alan) Wang
6a8fe66ef9 Add active, deprecated flag to dataset, modify ETL to update is_active (#604) 2017-07-19 17:07:28 -07:00
Yi (Alan) Wang
3138185068 Remove some LinkedIn specific ETL jobs (#601) 2017-07-18 16:39:45 -07:00
Mars Lan
af37b3c39f Rely on job file specifying kerberos.keytab.file using absolute path. (#578) 2017-07-10 13:44:35 -07:00
Mars Lan
9dca733d76 Move elastic-serach related props from wh_property table to job files. (#574)
This will break FlowTreeBuilder & DatasetTreeBuilder, both are not being used anyway.
2017-07-10 13:44:34 -07:00
Mars Lan
d2d92367f9 Also include TD table with O kind (no primary key) during ETL. (#571) 2017-07-10 13:44:34 -07:00
Mars Lan
e411aecf2b Missed another TD ut_ table filter. (#570) 2017-07-10 13:44:34 -07:00
Mars Lan
9f39f1f380 Remove the filter for TD ut_* tables as we do need the schema for them. (#569) 2017-07-10 13:44:34 -07:00
Mars Lan
68bb73f4a7 Remove the custom access count filter for DM_BIZ etc. (#568) 2017-07-10 13:44:34 -07:00
Mars Lan
75f57c6ac0 Remove some of the LI-speicifc ETLs. (#559) 2017-07-10 13:44:33 -07:00
Yi (Alan) Wang
fcbde02b37 Stop fetching EI/DEV-only Espresso DB (#522) 2017-07-10 13:42:59 -07:00
Yi (Alan) Wang
675dadd374 Kafka ETL to fetch queuing pipeline, also add topic blacklist (#509) 2017-07-10 13:42:57 -07:00
Yi (Alan) Wang
225d1fc6ec Modify HIVE ETL to commit often (#505) 2017-07-10 13:42:57 -07:00
Mars Lan
5f5c0937d1 Rename web, backend-service (#490)
* Rename web to wherehows-api and update README.

* Rename backend-service to wherehows-backend

* Rename metadata-etl to wherehows-etl

* Rename hadoop-dataset-extractor-standalone to wherehows-hadoop
2017-07-10 13:42:56 -07:00