datahub

mirror of https://github.com/datahub-project/datahub.git synced 2025-08-12 19:27:09 +00:00

Author	SHA1	Message	Date
camelliazhang	2aaafed98c	Merge pull request #274 from camelliazhang/master mark SCM users confirmed by system automatically	2016-11-11 11:53:22 -08:00
Yi (Alan) Wang	06ada42bb9	Merge pull request #272 from alyiwang/master Add JobExecutionLineageEvent and kafka processor	2016-11-11 11:39:09 -08:00
Na Zhang	1962f0a477	mark SCM users confirmed by system automatically	2016-11-11 11:12:28 -08:00
Na Zhang	2facf409b2	update the score table during elastic search dataset update	2016-11-11 10:09:31 -08:00
Yi Wang	b4f5e438e2	Add JobExecutionLineageEvent and kafka processor	2016-11-08 19:11:37 -08:00
Na Zhang	725e689326	add exception handling for DATABASE_SCM_METADATA_ETL and collect info	2016-11-08 17:37:36 -08:00
Na Zhang	217b7d9d09	search ranking improvement with static boosting	2016-11-08 15:18:51 -08:00
Eric Sun	7b36d09b58	Add get_schema_literal_from_url() to fetch schema literal based on schema url (#268 ) * use schema_url_helper to fetch avro schema from hdfs or http location * trim space * add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper	2016-11-07 08:14:45 -08:00
Yi Wang	664e4072bb	Upgrade to play 2.4.8	2016-10-19 17:42:28 -07:00
Na Zhang	dbaf053e76	Add local test properties template for teradata and scm owners ETL	2016-10-19 14:10:29 -07:00
Na Zhang	043dc25e89	Get owners for espresso and oracle, and fix a bug for teradata	2016-10-19 11:13:32 -07:00
Yi Wang	5049c847fa	Update Kafka consumer actors to reduce memory usage	2016-10-10 14:49:14 -07:00
Yi Wang	c9f4f18d9c	Update Azkaban_Execution job to fetch cronExpression in flow scheduling	2016-10-06 13:43:10 -07:00
Yi (Alan) Wang	c9dfb637af	Update MetadataChangeEvent APIs according to schema change (#243 ) * Update MetadataChangeEvent APIs according to schema change * Update MultiproductLoad to reflect new Owner types * Add comments for Owner_type precedence (priority) and compliance	2016-10-06 13:33:45 -07:00
Yi Wang	0356497124	Add comments for Owner_type precedence (priority) and compliance	2016-10-06 13:24:29 -07:00
Yi Wang	8ab5c824b0	Update MultiproductLoad to reflect new Owner types	2016-10-03 18:39:21 -07:00
camelliazhang	fe1e698b8a	remove hive instance hardcode cluster name (#236 )	2016-09-30 17:15:43 -07:00
Na Zhang	10339690a9	Update HiveTransform and HiveLoad, remove hardcoded cluster name	2016-09-30 16:59:59 -07:00
Eric Sun	fd3b4baef8	avoid loop in LDAP org hierarchy (#242 )	2016-09-30 16:45:38 -07:00
jerrybai2009	5f0426ea6b	using the dynamic cursor to reduce the memory usage (#241 )	2016-09-30 16:45:17 -07:00
jbai	a11e4908dc	tracking the GobblinTrackingEvent_autit to get owner information	2016-09-29 15:01:32 -07:00
Yi Wang	ac34eb683f	Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery	2016-09-26 15:06:33 -07:00
Na Zhang	5c76f47313	remove hive instance hardcode cluster name	2016-09-26 15:06:30 -07:00
Yi Wang	1ad2b1528e	logback redirect ETL job logs into corresponding files	2016-09-23 16:54:52 -07:00
jerrybai2009	f7878cdfe4	fix the elastic search index out of gc issue (#223 )	2016-09-13 16:43:48 -07:00
Eric Sun	86bf71499f	Reformat the ETL job info message in log. (#222 ) * Use ProcessBuilder and redirected log file for HDFS Extract * relax urn validation rule * continue process if hive sql parsor encounters error * reformat etl job log message	2016-09-13 14:01:14 -07:00
Yi Wang	33e592da14	Modify HdfsLoad to improve speed	2016-09-09 17:41:13 -07:00
Yi Wang	4c500402fe	Map repo owner fix, change 'main' to 'Producer' and reset sort id	2016-09-02 13:52:00 -07:00
Yi Wang	a809b0ac47	Map repo owner fix to use dataset group mapping	2016-09-01 18:19:41 -07:00
Yi Wang	81f891bfab	Map scm repo owner to dataset owner table	2016-08-30 15:35:28 -07:00
Yi (Alan) Wang	579b8fc9d7	Add metadataChangeEvent APIs to backend-service (#205 ) * Add multiproduct and git repo metadata etl job * Extract commit hash use it when querying acl * Use FileWriter to write records into CSV file * Remove unnecessary log entries from kafka processor * Fix the incompatibility between integer repo_id in db and string field in record * merge API tables to existing dataset owner and schema field table * Add confidential and recursive column to dict_dataset_field	2016-08-24 09:10:35 -07:00
Yi (Alan) Wang	078e90e8bd	Add multiproduct and git repo metadata etl job (#202 ) * Add multiproduct and git repo metadata etl job * implement the dataset availability section * Extract commit hash use it when querying acl * Use FileWriter to write records into CSV file * Remove unnecessary log entries from kafka processor * Fix the incompatibility between integer repo_id in db and string field in record	2016-08-12 12:26:55 -07:00
Eric Sun	cd4853d0a5	Use ProcessBuilder and redirected log file for HDFS Extract (#198 ) * Use ProcessBuilder and redirected log file for HDFS Extract * relax urn validation rule	2016-08-08 14:02:34 -07:00
Yi Wang	3d3b2a8075	Get kafka job id from applicatoin.conf and then get ref_id and configs from DB	2016-08-03 18:55:07 -07:00
Yi Wang	dbbdb6e2fb	Modify Oracle metadata ETL job, use Json dumps and remove unnecessary quotes	2016-08-03 18:49:00 -07:00
jerrybai2009	b4a718efd0	Merge pull request #195 from ericsun2/master temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem	2016-08-03 18:15:43 -07:00
jerrybai2009	e7c7175cba	Merge pull request #188 from jerrybai2009/master load the teradata and hadoop data into table dict_dataset_instance	2016-08-03 18:13:06 -07:00
Eric Sun	1cd5872369	temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem; exclude log4j	2016-08-03 15:50:00 -07:00
Eric Sun	6355ccc039	add python module [requests] for simple REST client	2016-07-29 23:10:33 -07:00
jbai	ea1ac0da9f	load the teradata and hadoop data into table dict_dataset_instance	2016-07-29 10:59:33 -07:00
Yi Wang	74ed769bab	add Oracle dataset metadata ETL job	2016-07-28 14:07:07 -07:00
jbai	85bc2db85c	add try catch to catch the exception when reading the config properties	2016-07-26 16:53:30 -07:00
Yi Wang	7edacc9a9f	get kafka config from wh_etl_job_property	2016-07-26 12:16:34 -07:00
Yi Wang	6d4706bc62	Ingest Gobblin tracking events into wherehows using Kafka consumer client	2016-07-25 15:03:29 -07:00
jbai	9fb5b09bd2	update dependency property name and fix the duplicated key issue when update cfg_object_name_map table	2016-07-20 19:07:16 -07:00
jbai	f3c299480f	update the column names from schema to schema_text and view_expanded_text to ddl_text	2016-07-20 18:01:25 -07:00
jbai	33b05cde4b	tracking the dalids schema and expanded text by versions	2016-07-20 15:59:11 -07:00
jbai	9166db7563	update the dict_dataset_instance data loading sql since table key changed	2016-06-29 18:00:10 -07:00
Eric Sun	1573fdb212	rename hive dependency to hive_exec; reuse metadata-etl/extralibs; test travis ci;	2016-06-28 18:03:02 -07:00
Eric Sun	5348d44a77	Force object (db.table) names extracted by the getViewDependency() API to lower cases object (db.table) extracted by the getViewDependency() API can contain the camel cases string, this can potentially cause mismatch in the underlying RDBMS.	2016-06-27 16:43:04 -07:00

1 2 3

117 Commits