* Read master key from environmental variable instead of from local file. This would allow us to pass it in via cfg2 ultimiately.
* Move the env var name to Constant.java
- Move logback.xml in metadata-etl to etl_logback.xml under backend/conf to avoid multiple logback config in classpath. ETL jobs are able to write to their own log file again.
- Replace generated single string command with String[] and invoke Runtime.getRuntime().exec(String[])
Benefits
1. Simpler setup - no need to download activator in order to build & run
2. Faster build - See https://engineering.linkedin.com/play/developing-play-applications-using-gradle
3. Streamlined dependency management - Everything defined in build.gradle, instead of build.gradle + build.sbt
4. Better integration with gradle lifecycle tasks - build, test, dist, clean all work as expected
Changes
1. Location of staging & distribution files moved from target to build
2. Use ./gradle -t runPlayBinary to run app with hot reload support
3. The generated start scripts are quite different from those generated by sbt
* Add playCoverage task to run code coverage using JaCoco for backend and web.
* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.
- When loading dataset fields in staging table, populate the dateset_id field first then use this in later JOIN.
- When JOIN two big tables such as dict_field_detail, use pre-select to reduce table JOIN size and DB resource.
- Refactor some SQL code.
- Modify logback setting to better capture log time.
- Remove unnecessary config in backend application.conf
* Update MetadataChangeEvent APIs according to schema change
* Update MultiproductLoad to reflect new Owner types
* Add comments for Owner_type precedence (priority) and compliance
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
* continue process if hive sql parsor encounters error
* reformat etl job log message
* add API to find dataset dependents, such as which hive tables are based on an hdfs path
* Add multiproduct and git repo metadata etl job
* Extract commit hash use it when querying acl
* Use FileWriter to write records into CSV file
* Remove unnecessary log entries from kafka processor
* Fix the incompatibility between integer repo_id in db and string field in record
* merge API tables to existing dataset owner and schema field table
* Add confidential and recursive column to dict_dataset_field