datahub

mirror of https://github.com/datahub-project/datahub.git synced 2025-10-22 14:35:17 +00:00

Author	SHA1	Message	Date
Mars Lan	2499ee0116	Unify the open-source application.conf with internal ones so we don't need to maintain both.	2017-07-10 13:42:50 -07:00
Mars Lan	fda572dd8a	Allow the logback directory for ETL jobs to be overridden using system property (#448 ) * Allow the logback directory for ETL jobs to be overridden using system property. See https://logback.qos.ch/manual/configuration.html#variableSubstitution for more details. * Add WHZ_ETL_TEMP_DIR env var and play config to control where the ETL job logs & temp files to be saved. This enables us to move away from the default /var/tmp/wherehows directory.	2017-07-10 13:42:16 -07:00
Mars Lan	c75fa5e6dc	Use environmental variables to set ETL & Kafka job IDs. (#418 ) This will allow us to set different job IDs in staging & production via cfg2.	2017-07-10 09:57:51 -07:00
Mars Lan	11d6186fe6	Add healthcheck endpoint for frontend & backend. (#388 )	2017-07-10 09:55:11 -07:00
Yi (Alan) Wang	8ede6f3314	Move logback.xml, modify etl job command generation (#364 ) - Move logback.xml in metadata-etl to etl_logback.xml under backend/conf to avoid multiple logback config in classpath. ETL jobs are able to write to their own log file again. - Replace generated single string command with String[] and invoke Runtime.getRuntime().exec(String[])	2017-07-10 09:54:20 -07:00
Shuya Tsukamoto	33e04a585a	Make possible to change settings via environment variables (#533 )	2017-05-26 10:28:05 -07:00
Yi (Alan) Wang	b6e644fbb1	Optimize dataset load scripts, improve speed (#350 ) - When loading dataset fields in staging table, populate the dateset_id field first then use this in later JOIN. - When JOIN two big tables such as dict_field_detail, use pre-select to reduce table JOIN size and DB resource. - Refactor some SQL code. - Modify logback setting to better capture log time. - Remove unnecessary config in backend application.conf	2017-03-22 10:23:30 -07:00
Yi (Alan) Wang	e07306b51e	Update MetadataChangeEvent, separate privacy compliance from security (#275 )	2016-11-11 17:25:41 -08:00
Yi Wang	b4f5e438e2	Add JobExecutionLineageEvent and kafka processor	2016-11-08 19:11:37 -08:00
Yi Wang	664e4072bb	Upgrade to play 2.4.8	2016-10-19 17:42:28 -07:00
Eric Sun	89ff794ddf	Add api to get dependents of a dataset (#232 ) * Use ProcessBuilder and redirected log file for HDFS Extract * relax urn validation rule * continue process if hive sql parsor encounters error * reformat etl job log message * add API to find dataset dependents, such as which hive tables are based on an hdfs path	2016-09-21 08:55:44 -07:00
Yi Wang	b136fc6c37	Add MetadataInventoryEvent processor and API	2016-09-15 09:22:42 -07:00
Yi (Alan) Wang	579b8fc9d7	Add metadataChangeEvent APIs to backend-service (#205 ) * Add multiproduct and git repo metadata etl job * Extract commit hash use it when querying acl * Use FileWriter to write records into CSV file * Remove unnecessary log entries from kafka processor * Fix the incompatibility between integer repo_id in db and string field in record * merge API tables to existing dataset owner and schema field table * Add confidential and recursive column to dict_dataset_field	2016-08-24 09:10:35 -07:00
Yi Wang	3d3b2a8075	Get kafka job id from applicatoin.conf and then get ref_id and configs from DB	2016-08-03 18:55:07 -07:00
Naga Srinivas Vemuri	97370ed2e1	Query Dataset properties to retrieve datasetUrns	2016-07-21 11:54:47 +05:30
jbai	6af54658d6	merge Fetching dataset watchers via get /dataset/watchers to main branch	2016-06-30 10:20:54 -07:00
jbai	9705a07ad8	provide the dataset dependency api	2016-06-14 16:17:24 -07:00
SunZhaonan	9d6a1b2649	Add optional config of ETL job white list	2016-05-12 16:28:23 -07:00
SunZhaonan	aff8f323e4	Scheduler check previous job is finished. Redirect remote outputstream into log. Fix avro parser bugs	2016-03-16 19:09:53 -07:00
SunZhaonan	d5c3d87d00	Initial commit	2015-11-19 14:39:21 -08:00

20 Commits