datahub

mirror of https://github.com/datahub-project/datahub.git synced 2025-11-01 19:25:56 +00:00

Author	SHA1	Message	Date
Mars Lan	e36a40cd65	Generate code coverage reports (#334 ) * Add playCoverage task to run code coverage using JaCoco for backend and web. * Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.	2017-07-10 09:53:28 -07:00
Yi Wang	7d6bb9fac9	Add KAFKA ETL job to fetch topics from Nuage	2017-07-10 09:53:27 -07:00
Yi Wang	a9335bc49a	Add VOLDEMORT ETL job to fetch datasets from Nuage	2017-07-10 09:53:27 -07:00
Yi Wang	cf49ae375c	ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage	2017-07-10 09:53:27 -07:00
jbai	e8b21a17df	implement the Appworx log parser	2017-07-10 09:53:24 -07:00
jbai	eb93d67b64	support Appworx flow and job definition and execution	2017-07-10 09:53:24 -07:00
Zhen Chen	b36f774feb	add dali view owner etl	2017-07-10 09:53:23 -07:00
Yi (Alan) Wang	488929ad93	Modify Confidential info schema to add identifierField and logicalType (#385 )	2017-03-30 21:29:20 -07:00
Yi Wang	adba532474	Modify compliance purge entity record to support logical type and is_subject	2017-03-23 22:08:54 -07:00
Yi Wang	6c57e30240	Fix bugs found by AppCheck in issue #328	2017-02-24 11:08:18 -08:00
camelliazhang	724f754f03	clean and refactor elastic serach ETL job (#300 )	2016-12-14 21:22:30 -08:00
Eric Sun	a3504fa57f	Fix jsonpath after upgrading com.jayway.jsonpath to 2.2 (#299 ) * use schema_url_helper to fetch avro schema from hdfs or http location * trim space * add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper * fix jsonpath for job history log parser; do not throw exception if kerberos config files are missing for job history http connection * avoid null return value for sepCommaString(); fix a typo	2016-12-13 21:14:52 -08:00
Yi (Alan) Wang	e07306b51e	Update MetadataChangeEvent, separate privacy compliance from security (#275 )	2016-11-11 17:25:41 -08:00
Yi Wang	b4f5e438e2	Add JobExecutionLineageEvent and kafka processor	2016-11-08 19:11:37 -08:00
Eric Sun	7b36d09b58	Add get_schema_literal_from_url() to fetch schema literal based on schema url (#268 ) * use schema_url_helper to fetch avro schema from hdfs or http location * trim space * add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper	2016-11-07 08:14:45 -08:00
Yi Wang	664e4072bb	Upgrade to play 2.4.8	2016-10-19 17:42:28 -07:00
Na Zhang	043dc25e89	Get owners for espresso and oracle, and fix a bug for teradata	2016-10-19 11:13:32 -07:00
Yi Wang	5049c847fa	Update Kafka consumer actors to reduce memory usage	2016-10-10 14:49:14 -07:00
Yi Wang	c9f4f18d9c	Update Azkaban_Execution job to fetch cronExpression in flow scheduling	2016-10-06 13:43:10 -07:00
Yi (Alan) Wang	c9dfb637af	Update MetadataChangeEvent APIs according to schema change (#243 ) * Update MetadataChangeEvent APIs according to schema change * Update MultiproductLoad to reflect new Owner types * Add comments for Owner_type precedence (priority) and compliance	2016-10-06 13:33:45 -07:00
Yi Wang	0356497124	Add comments for Owner_type precedence (priority) and compliance	2016-10-06 13:24:29 -07:00
Yi Wang	b74d58a33f	Update MetadataChangeEvent APIs according to schema change	2016-10-03 10:56:23 -07:00
jbai	a11e4908dc	tracking the GobblinTrackingEvent_autit to get owner information	2016-09-29 15:01:32 -07:00
Yi Wang	ac34eb683f	Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery	2016-09-26 15:06:33 -07:00
Yi (Alan) Wang	753de7de7c	Merge pull request #233 from alyiwang/master Update backend APIs to cast SQL results back to Java record then to Json	2016-09-21 08:59:04 -07:00
Eric Sun	89ff794ddf	Add api to get dependents of a dataset (#232 ) * Use ProcessBuilder and redirected log file for HDFS Extract * relax urn validation rule * continue process if hive sql parsor encounters error * reformat etl job log message * add API to find dataset dependents, such as which hive tables are based on an hdfs path	2016-09-21 08:55:44 -07:00
Yi Wang	be65efb0cc	Update backend APIs to cast SQL results back to Java record then serialize to Json reply	2016-09-20 18:56:49 -07:00
Yi Wang(Data Infrastructure)	1171e00097	Add REST proxy for Security API from backend to web	2016-09-19 18:14:10 -07:00
Yi Wang	ee01d7c6c7	rename DatasetPropertiesRecord to DatasetInventoryPropertiesRecord	2016-09-15 11:46:26 -07:00
Yi Wang	b136fc6c37	Add MetadataInventoryEvent processor and API	2016-09-15 09:22:42 -07:00
Eric Sun	86bf71499f	Reformat the ETL job info message in log. (#222 ) * Use ProcessBuilder and redirected log file for HDFS Extract * relax urn validation rule * continue process if hive sql parsor encounters error * reformat etl job log message	2016-09-13 14:01:14 -07:00
Yi Wang	5515cbdde9	Add MatadataChangeEvent processor to call seperate APIs	2016-09-06 16:41:50 -07:00
Yi Wang	81f891bfab	Map scm repo owner to dataset owner table	2016-08-30 15:35:28 -07:00
Yi Wang	e2b42d2ccb	Update DatasetOwnerRecord to be compatible with linkedin branch	2016-08-25 09:12:31 -07:00
Yi Wang	7cbda15b5a	Add confidential and recursive column to dict_dataset_field	2016-08-23 15:50:30 -07:00
Yi Wang	d46a9d8b8e	merge API tables to existing dataset owner and schema field table	2016-08-22 17:06:20 -07:00
Yi Wang	46871face6	Add metadataChangeEvent APIs to backend-service	2016-08-16 18:47:53 -07:00
Yi (Alan) Wang	078e90e8bd	Add multiproduct and git repo metadata etl job (#202 ) * Add multiproduct and git repo metadata etl job * implement the dataset availability section * Extract commit hash use it when querying acl * Use FileWriter to write records into CSV file * Remove unnecessary log entries from kafka processor * Fix the incompatibility between integer repo_id in db and string field in record	2016-08-12 12:26:55 -07:00
Yi Wang	44807f5f7e	Fix the incompatibility between integer repo_id in db and string field in record	2016-08-10 17:24:03 -07:00
Yi Wang	bc276274ff	Use FileWriter to write records into CSV file	2016-08-10 11:20:31 -07:00
Yi Wang	83834e4e88	Add fetching acl owner info from svn, also change some property names.	2016-08-10 09:11:37 -07:00
Yi Wang	830413e122	Add multiproduct and git repo metadata etl job	2016-08-08 21:28:37 -07:00
Yi Wang	3d3b2a8075	Get kafka job id from applicatoin.conf and then get ref_id and configs from DB	2016-08-03 18:55:07 -07:00
Eric Sun	9d2c803f0c	Merge pull request #187 from ericsun2/master Add datacenter, deploymenttier, cluster info to better describe dataset instance	2016-07-28 17:22:32 -07:00
Eric Sun	f745642212	add datacenter, deploymenttier, cluster to describe dataset instance	2016-07-28 16:38:03 -07:00
Yi Wang	74ed769bab	add Oracle dataset metadata ETL job	2016-07-28 14:07:07 -07:00
Yi Wang	6d4706bc62	Ingest Gobblin tracking events into wherehows using Kafka consumer client	2016-07-25 15:03:29 -07:00
jbai	0f5124579c	fix the issue of datasetSchemaRecord expected 11 args but got 9	2016-07-21 17:39:22 -07:00
jbai	7a77aba4b7	merge the pull request 165 to master branch	2016-07-21 10:38:36 -07:00
jbai	9fb5b09bd2	update dependency property name and fix the duplicated key issue when update cfg_object_name_map table	2016-07-20 19:07:16 -07:00

1 2

78 Commits