98 Commits

Author SHA1 Message Date
Mars Lan
d57bce2c0b Redirect ETL job's stderr & stdout to files to make debugging easier. (#465) 2017-07-10 13:42:54 -07:00
Yi (Alan) Wang
93242768ff Update runBackend to source application.env for conf values (#458) 2017-07-10 13:42:53 -07:00
Mars Lan
f5a7e0c9ec Make sure all intermediate directories are created for ETL job property files. (#450) 2017-07-10 13:42:51 -07:00
Mars Lan
2499ee0116 Unify the open-source application.conf with internal ones so we don't need to maintain both. 2017-07-10 13:42:50 -07:00
Mars Lan
fda572dd8a Allow the logback directory for ETL jobs to be overridden using system property (#448)
* Allow the logback directory for ETL jobs to be overridden using system property.

See https://logback.qos.ch/manual/configuration.html#variableSubstitution for more details.

* Add WHZ_ETL_TEMP_DIR env var and play config to control where the ETL job logs & temp files to be saved.

This enables us to move away from the default /var/tmp/wherehows directory.
2017-07-10 13:42:16 -07:00
Mars Lan
ebeda9f690 Remove unused env var from template. 2017-07-10 09:58:44 -07:00
Mars Lan
9e55d80538 Add WHZ_KRB5_DIR environmental variable to the search path for gss-jass.conf & krb5.conf files. (#421)
Also remove the unset WH_HOME directory from the search path.
2017-07-10 09:58:43 -07:00
Mars Lan
c75fa5e6dc Use environmental variables to set ETL & Kafka job IDs. (#418)
This will allow us to set different job IDs in staging & production via cfg2.
2017-07-10 09:57:51 -07:00
Mars Lan
cf4e157813 Read master key from environmental variable instead of from local fil… (#417)
* Read master key from environmental variable instead of from local file. This would allow us to pass it in via cfg2 ultimiately.

* Move the env var name to Constant.java
2017-07-10 09:55:16 -07:00
Mars Lan
11d6186fe6 Add healthcheck endpoint for frontend & backend. (#388) 2017-07-10 09:55:11 -07:00
Yi (Alan) Wang
8ede6f3314 Move logback.xml, modify etl job command generation (#364)
- Move logback.xml in metadata-etl to etl_logback.xml under backend/conf to avoid multiple logback config in classpath. ETL jobs are able to write to their own log file again.
- Replace generated single string command with String[] and invoke Runtime.getRuntime().exec(String[])
2017-07-10 09:54:20 -07:00
Yi (Alan) Wang
3360fe79cc Modify genearate java command to solve classpath issue (#362)
Remove the single quote around classpath.
2017-07-10 09:54:20 -07:00
Mars Lan
a589abbd76 Split the root build script into multiple scripts. (#348)
Split the root build script into multiple scripts
Add coveralls support.
2017-07-10 09:54:08 -07:00
Mars Lan
6b7609918e Replace sbt build with native Gradle Play plugin and update the docs. (#352)
Benefits
1. Simpler setup - no need to download activator in order to build & run
2. Faster build - See https://engineering.linkedin.com/play/developing-play-applications-using-gradle
3. Streamlined dependency management - Everything defined in build.gradle, instead of build.gradle + build.sbt
4. Better integration with gradle lifecycle tasks - build, test, dist, clean all work as expected

Changes
1. Location of staging & distribution files moved from target to build
2. Use ./gradle -t runPlayBinary to run app with hot reload support
3. The generated start scripts are quite different from those generated by sbt
2017-07-10 09:54:08 -07:00
Mars Lan
5a999f29b1 Revert "Split the root build script into multiple scripts."
This reverts commit 4b8a6f86577739209b09ec8cc8cb09c2808f4aa7.
2017-07-10 09:54:08 -07:00
Mars Lan
edf5c54de3 Split the root build script into multiple scripts.
Add support for coveralls.
2017-07-10 09:54:08 -07:00
Mars Lan
e36a40cd65 Generate code coverage reports (#334)
* Add playCoverage task to run code coverage using JaCoco for backend and web.

* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.
2017-07-10 09:53:28 -07:00
Mars Lan
bcc3cd9f76 Make unit tests buildable again for backend and web (#325)
* Make unit tests buildable again for backend and web.

* Add back fest dependency so the tests can stay more of less the same as before.
2017-07-10 09:53:28 -07:00
Naga Srinivas Vemuri
803e3added Modify /dataset POST method to perform INSERT or UPDATE of the DatasetRecord 2017-07-10 09:53:25 -07:00
Christopher Chiche
d064e7bc47 Fix title in backend's README (#552) 2017-06-08 08:09:47 -07:00
Shuya Tsukamoto
33e04a585a Make possible to change settings via environment variables (#533) 2017-05-26 10:28:05 -07:00
Shuya Tsukamoto
53fe63680f Add a mkdir comamnd for the TreeBuilder output. (#499) 2017-05-18 13:54:56 -07:00
Yi (Alan) Wang
b6e644fbb1 Optimize dataset load scripts, improve speed (#350)
- When loading dataset fields in staging table, populate the dateset_id field first then use this in later JOIN.
- When JOIN two big tables such as dict_field_detail, use pre-select to reduce table JOIN size and DB resource.
- Refactor some SQL code.
- Modify logback setting to better capture log time.
- Remove unnecessary config in backend application.conf
2017-03-22 10:23:30 -07:00
Yi (Alan) Wang
66a8eea21b Fix issues from Oracle MetadataChangeEvent integration (#336)
* Fix issues from Oracle MetadataChangeEvent integration
2017-03-14 17:19:30 -07:00
Yi (Alan) Wang
4f873a919a Fix bugs found by AppCheck in issue #328 (#335) 2017-02-24 14:20:56 -08:00
Yi Wang
14824c06bb Change sleep to 10s after etl job init error 2017-01-30 09:27:42 -08:00
Yi (Alan) Wang
665a5dbded Add retry for ETL jobs failed at initialization (#308) 2017-01-27 11:17:38 -08:00
Yi Wang
ea8f6e8551 Add retry for ETL jobs failed at initialization 2017-01-20 14:11:45 -08:00
Yi (Alan) Wang
e07306b51e Update MetadataChangeEvent, separate privacy compliance from security (#275) 2016-11-11 17:25:41 -08:00
Yi Wang
b4f5e438e2 Add JobExecutionLineageEvent and kafka processor 2016-11-08 19:11:37 -08:00
Yi (Alan) Wang
e34bbcc629 Update README.md (#264) 2016-11-02 13:48:22 -07:00
Yi (Alan) Wang
dca47a3b75 Merge pull request #254 from alyiwang/master
Upgrade to play 2.4.8
2016-10-20 13:18:58 -07:00
Douglas Moore
53f6622ed8 Update README.md (#252)
Remove backlink to my github account.
2016-10-19 18:37:40 -07:00
Yi Wang
664e4072bb Upgrade to play 2.4.8 2016-10-19 17:42:28 -07:00
Yi Wang
3227412339 Login authentication support multiple LDAP servers, add login history 2016-10-13 14:30:43 -07:00
Yi Wang
fcd6cf149e Update MetastoreAuditProcessor to reduce storage, also refactor some code 2016-10-11 11:26:36 -07:00
Yi Wang
5049c847fa Update Kafka consumer actors to reduce memory usage 2016-10-10 14:49:14 -07:00
Yi (Alan) Wang
c9dfb637af Update MetadataChangeEvent APIs according to schema change (#243)
* Update MetadataChangeEvent APIs according to schema change

* Update MultiproductLoad to reflect new Owner types

* Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:33:45 -07:00
jbai
a11e4908dc tracking the GobblinTrackingEvent_autit to get owner information 2016-09-29 15:01:32 -07:00
Yi Wang
ac34eb683f Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery 2016-09-26 15:06:33 -07:00
Yi Wang
1ad2b1528e logback redirect ETL job logs into corresponding files 2016-09-23 16:54:52 -07:00
Yi (Alan) Wang
753de7de7c Merge pull request #233 from alyiwang/master
Update backend APIs to cast SQL results back to Java record then to Json
2016-09-21 08:59:04 -07:00
Eric Sun
89ff794ddf Add api to get dependents of a dataset (#232)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule

* continue process if hive sql parsor encounters error

* reformat etl job log message

* add API to find dataset dependents, such as which hive tables are based on an hdfs path
2016-09-21 08:55:44 -07:00
Yi Wang
be65efb0cc Update backend APIs to cast SQL results back to Java record then serialize to Json reply 2016-09-20 18:56:49 -07:00
Yi Wang(Data Infrastructure)
1171e00097 Add REST proxy for Security API from backend to web 2016-09-19 18:14:10 -07:00
Yi Wang
b136fc6c37 Add MetadataInventoryEvent processor and API 2016-09-15 09:22:42 -07:00
Eric Sun
86bf71499f Reformat the ETL job info message in log. (#222)
* Use ProcessBuilder and redirected log file for HDFS Extract

* relax urn validation rule

* continue process if hive sql parsor encounters error

* reformat etl job log message
2016-09-13 14:01:14 -07:00
Yi Wang
5ce5a1425e Add hostname and process_id to wh_etl_job_execution 2016-09-12 16:09:33 -07:00
Yi Wang
5515cbdde9 Add MatadataChangeEvent processor to call seperate APIs 2016-09-06 16:41:50 -07:00
Eric Sun
0ac00e1af3 Update README.md 2016-09-02 09:35:13 -07:00