Mars Lan
e36a40cd65
Generate code coverage reports ( #334 )
...
* Add playCoverage task to run code coverage using JaCoco for backend and web.
* Add jacocoTestReport task to run code coverage for testNG-based tests in wherehows-common & metadata-etl.
2017-07-10 09:53:28 -07:00
Yi Wang
7d6bb9fac9
Add KAFKA ETL job to fetch topics from Nuage
2017-07-10 09:53:27 -07:00
Yi Wang
a9335bc49a
Add VOLDEMORT ETL job to fetch datasets from Nuage
2017-07-10 09:53:27 -07:00
Yi Wang
cf49ae375c
ADD ESPRESSO_DATASET_METADATA_ETL job to fetch Espresso metadata from Nuage
2017-07-10 09:53:27 -07:00
jbai
e8b21a17df
implement the Appworx log parser
2017-07-10 09:53:24 -07:00
jbai
eb93d67b64
support Appworx flow and job definition and execution
2017-07-10 09:53:24 -07:00
Zhen Chen
b36f774feb
add dali view owner etl
2017-07-10 09:53:23 -07:00
Yi (Alan) Wang
488929ad93
Modify Confidential info schema to add identifierField and logicalType ( #385 )
2017-03-30 21:29:20 -07:00
Yi Wang
adba532474
Modify compliance purge entity record to support logical type and is_subject
2017-03-23 22:08:54 -07:00
Yi Wang
6c57e30240
Fix bugs found by AppCheck in issue #328
2017-02-24 11:08:18 -08:00
camelliazhang
724f754f03
clean and refactor elastic serach ETL job ( #300 )
2016-12-14 21:22:30 -08:00
Eric Sun
a3504fa57f
Fix jsonpath after upgrading com.jayway.jsonpath to 2.2 ( #299 )
...
* use schema_url_helper to fetch avro schema from hdfs or http location
* trim space
* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
* fix jsonpath for job history log parser; do not throw exception if kerberos config files are missing for job history http connection
* avoid null return value for sepCommaString(); fix a typo
2016-12-13 21:14:52 -08:00
Yi (Alan) Wang
e07306b51e
Update MetadataChangeEvent, separate privacy compliance from security ( #275 )
2016-11-11 17:25:41 -08:00
Yi Wang
b4f5e438e2
Add JobExecutionLineageEvent and kafka processor
2016-11-08 19:11:37 -08:00
Eric Sun
7b36d09b58
Add get_schema_literal_from_url() to fetch schema literal based on schema url ( #268 )
...
* use schema_url_helper to fetch avro schema from hdfs or http location
* trim space
* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
2016-11-07 08:14:45 -08:00
Yi Wang
664e4072bb
Upgrade to play 2.4.8
2016-10-19 17:42:28 -07:00
Na Zhang
043dc25e89
Get owners for espresso and oracle, and fix a bug for teradata
2016-10-19 11:13:32 -07:00
Yi Wang
5049c847fa
Update Kafka consumer actors to reduce memory usage
2016-10-10 14:49:14 -07:00
Yi Wang
c9f4f18d9c
Update Azkaban_Execution job to fetch cronExpression in flow scheduling
2016-10-06 13:43:10 -07:00
Yi (Alan) Wang
c9dfb637af
Update MetadataChangeEvent APIs according to schema change ( #243 )
...
* Update MetadataChangeEvent APIs according to schema change
* Update MultiproductLoad to reflect new Owner types
* Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:33:45 -07:00
Yi Wang
0356497124
Add comments for Owner_type precedence (priority) and compliance
2016-10-06 13:24:29 -07:00
Yi Wang
b74d58a33f
Update MetadataChangeEvent APIs according to schema change
2016-10-03 10:56:23 -07:00
jbai
a11e4908dc
tracking the GobblinTrackingEvent_autit to get owner information
2016-09-29 15:01:32 -07:00
Yi Wang
ac34eb683f
Update Kafka processor casting Object to String, also add debug info if can't fetch schema from Registery
2016-09-26 15:06:33 -07:00
Yi (Alan) Wang
753de7de7c
Merge pull request #233 from alyiwang/master
...
Update backend APIs to cast SQL results back to Java record then to Json
2016-09-21 08:59:04 -07:00
Eric Sun
89ff794ddf
Add api to get dependents of a dataset ( #232 )
...
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
* continue process if hive sql parsor encounters error
* reformat etl job log message
* add API to find dataset dependents, such as which hive tables are based on an hdfs path
2016-09-21 08:55:44 -07:00
Yi Wang
be65efb0cc
Update backend APIs to cast SQL results back to Java record then serialize to Json reply
2016-09-20 18:56:49 -07:00
Yi Wang(Data Infrastructure)
1171e00097
Add REST proxy for Security API from backend to web
2016-09-19 18:14:10 -07:00
Yi Wang
ee01d7c6c7
rename DatasetPropertiesRecord to DatasetInventoryPropertiesRecord
2016-09-15 11:46:26 -07:00
Yi Wang
b136fc6c37
Add MetadataInventoryEvent processor and API
2016-09-15 09:22:42 -07:00
Eric Sun
86bf71499f
Reformat the ETL job info message in log. ( #222 )
...
* Use ProcessBuilder and redirected log file for HDFS Extract
* relax urn validation rule
* continue process if hive sql parsor encounters error
* reformat etl job log message
2016-09-13 14:01:14 -07:00
Yi Wang
5515cbdde9
Add MatadataChangeEvent processor to call seperate APIs
2016-09-06 16:41:50 -07:00
Yi Wang
81f891bfab
Map scm repo owner to dataset owner table
2016-08-30 15:35:28 -07:00
Yi Wang
e2b42d2ccb
Update DatasetOwnerRecord to be compatible with linkedin branch
2016-08-25 09:12:31 -07:00
Yi Wang
7cbda15b5a
Add confidential and recursive column to dict_dataset_field
2016-08-23 15:50:30 -07:00
Yi Wang
d46a9d8b8e
merge API tables to existing dataset owner and schema field table
2016-08-22 17:06:20 -07:00
Yi Wang
46871face6
Add metadataChangeEvent APIs to backend-service
2016-08-16 18:47:53 -07:00
Yi (Alan) Wang
078e90e8bd
Add multiproduct and git repo metadata etl job ( #202 )
...
* Add multiproduct and git repo metadata etl job
* implement the dataset availability section
* Extract commit hash use it when querying acl
* Use FileWriter to write records into CSV file
* Remove unnecessary log entries from kafka processor
* Fix the incompatibility between integer repo_id in db and string field in record
2016-08-12 12:26:55 -07:00
Yi Wang
44807f5f7e
Fix the incompatibility between integer repo_id in db and string field in record
2016-08-10 17:24:03 -07:00
Yi Wang
bc276274ff
Use FileWriter to write records into CSV file
2016-08-10 11:20:31 -07:00
Yi Wang
83834e4e88
Add fetching acl owner info from svn, also change some property names.
2016-08-10 09:11:37 -07:00
Yi Wang
830413e122
Add multiproduct and git repo metadata etl job
2016-08-08 21:28:37 -07:00
Yi Wang
3d3b2a8075
Get kafka job id from applicatoin.conf and then get ref_id and configs from DB
2016-08-03 18:55:07 -07:00
Eric Sun
9d2c803f0c
Merge pull request #187 from ericsun2/master
...
Add datacenter, deploymenttier, cluster info to better describe dataset instance
2016-07-28 17:22:32 -07:00
Eric Sun
f745642212
add datacenter, deploymenttier, cluster to describe dataset instance
2016-07-28 16:38:03 -07:00
Yi Wang
74ed769bab
add Oracle dataset metadata ETL job
2016-07-28 14:07:07 -07:00
Yi Wang
6d4706bc62
Ingest Gobblin tracking events into wherehows using Kafka consumer client
2016-07-25 15:03:29 -07:00
jbai
0f5124579c
fix the issue of datasetSchemaRecord expected 11 args but got 9
2016-07-21 17:39:22 -07:00
jbai
7a77aba4b7
merge the pull request 165 to master branch
2016-07-21 10:38:36 -07:00
jbai
9fb5b09bd2
update dependency property name and fix the duplicated key issue when update cfg_object_name_map table
2016-07-20 19:07:16 -07:00