20 Commits

Author SHA1 Message Date
Mars Lan
2499ee0116 Unify the open-source application.conf with internal ones so we don't need to maintain both. 2017-07-10 13:42:50 -07:00
Mars Lan
fda572dd8a Allow the logback directory for ETL jobs to be overridden using system property (#448)
* Allow the logback directory for ETL jobs to be overridden using system property.

See https://logback.qos.ch/manual/configuration.html#variableSubstitution for more details.

* Add WHZ_ETL_TEMP_DIR env var and play config to control where the ETL job logs & temp files to be saved.

This enables us to move away from the default /var/tmp/wherehows directory.
2017-07-10 13:42:16 -07:00
Mars Lan
c75fa5e6dc Use environmental variables to set ETL & Kafka job IDs. (#418)
This will allow us to set different job IDs in staging & production via cfg2.
2017-07-10 09:57:51 -07:00
Mars Lan
11d6186fe6 Add healthcheck endpoint for frontend & backend. (#388) 2017-07-10 09:55:11 -07:00
Yi (Alan) Wang
8ede6f3314 Move logback.xml, modify etl job command generation (#364)
- Move logback.xml in metadata-etl to etl_logback.xml under backend/conf to avoid multiple logback config in classpath. ETL jobs are able to write to their own log file again.
- Replace generated single string command with String[] and invoke Runtime.getRuntime().exec(String[])
2017-07-10 09:54:20 -07:00
Shuya Tsukamoto
33e04a585a Make possible to change settings via environment variables (#533) 2017-05-26 10:28:05 -07:00
Yi (Alan) Wang
b6e644fbb1 Optimize dataset load scripts, improve speed (#350)
- When loading dataset fields in staging table, populate the dateset_id field first then use this in later JOIN.
- When JOIN two big tables such as dict_field_detail, use pre-select to reduce table JOIN size and DB resource.
- Refactor some SQL code.
- Modify logback setting to better capture log time.
- Remove unnecessary config in backend application.conf
2017-03-22 10:23:30 -07:00
Yi (Alan) Wang
e07306b51e Update MetadataChangeEvent, separate privacy compliance from security (#275) 2016-11-11 17:25:41 -08:00
Yi Wang
b4f5e438e2 Add JobExecutionLineageEvent and kafka processor 2016-11-08 19:11:37 -08:00
Yi Wang
664e4072bb Upgrade to play 2.4.8 2016-10-19 17:42:28 -07:00
Eric Sun
89ff794ddf Add api to get dependents of a dataset (#232)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule

* continue process if hive sql parsor encounters error

* reformat etl job log message

* add API to find dataset dependents, such as which hive tables are based on an hdfs path
2016-09-21 08:55:44 -07:00
Yi Wang
b136fc6c37 Add MetadataInventoryEvent processor and API 2016-09-15 09:22:42 -07:00
Yi (Alan) Wang
579b8fc9d7 Add metadataChangeEvent APIs to backend-service (#205)
* Add multiproduct and git repo metadata etl job

* Extract commit hash use it when querying acl

* Use FileWriter to write records into CSV file

* Remove unnecessary log entries from kafka processor

* Fix the incompatibility between integer repo_id in db and string field in record

* merge API tables to existing dataset owner and schema field table

* Add confidential and recursive column to dict_dataset_field
2016-08-24 09:10:35 -07:00
Yi Wang
3d3b2a8075 Get kafka job id from applicatoin.conf and then get ref_id and configs from DB 2016-08-03 18:55:07 -07:00
Naga Srinivas Vemuri
97370ed2e1 Query Dataset properties to retrieve datasetUrns 2016-07-21 11:54:47 +05:30
jbai
6af54658d6 merge Fetching dataset watchers via get /dataset/watchers to main branch 2016-06-30 10:20:54 -07:00
jbai
9705a07ad8 provide the dataset dependency api 2016-06-14 16:17:24 -07:00
SunZhaonan
9d6a1b2649 Add optional config of ETL job white list 2016-05-12 16:28:23 -07:00
SunZhaonan
aff8f323e4 Scheduler check previous job is finished. Redirect remote outputstream into log. Fix avro parser bugs 2016-03-16 19:09:53 -07:00
SunZhaonan
d5c3d87d00 Initial commit 2015-11-19 14:39:21 -08:00