* Allow the logback directory for ETL jobs to be overridden using system property.
See https://logback.qos.ch/manual/configuration.html#variableSubstitution for more details.
* Add WHZ_ETL_TEMP_DIR env var and play config to control where the ETL job logs & temp files to be saved.
This enables us to move away from the default /var/tmp/wherehows directory.
* Read master key from environmental variable instead of from local file. This would allow us to pass it in via cfg2 ultimiately.
* Move the env var name to Constant.java
- Move logback.xml in metadata-etl to etl_logback.xml under backend/conf to avoid multiple logback config in classpath. ETL jobs are able to write to their own log file again.
- Replace generated single string command with String[] and invoke Runtime.getRuntime().exec(String[])
Benefits
1. Simpler setup - no need to download activator in order to build & run
2. Faster build - See https://engineering.linkedin.com/play/developing-play-applications-using-gradle
3. Streamlined dependency management - Everything defined in build.gradle, instead of build.gradle + build.sbt
4. Better integration with gradle lifecycle tasks - build, test, dist, clean all work as expected
Changes
1. Location of staging & distribution files moved from target to build
2. Use ./gradle -t runPlayBinary to run app with hot reload support
3. The generated start scripts are quite different from those generated by sbt
* Add playCoverage task to run code coverage using JaCoco for backend and web.
* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.
- When loading dataset fields in staging table, populate the dateset_id field first then use this in later JOIN.
- When JOIN two big tables such as dict_field_detail, use pre-select to reduce table JOIN size and DB resource.
- Refactor some SQL code.
- Modify logback setting to better capture log time.
- Remove unnecessary config in backend application.conf
* Update MetadataChangeEvent APIs according to schema change
* Update MultiproductLoad to reflect new Owner types
* Add comments for Owner_type precedence (priority) and compliance
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
* continue process if hive sql parsor encounters error
* reformat etl job log message
* add API to find dataset dependents, such as which hive tables are based on an hdfs path