Table of Contents
- Table of Contents
- Dataset GET API
- Dataset POST API
- Downstream/Upstream Jobs GET API
- Downstream/Upstream Datasets GET API
- Job Lineage POST API
- Dataset Dependency GET API
- Flow Owner GET API
- Flow Schedule GET API
- Filename Pattern GET API
- Filename Pattern POST API
- Dataset Partition Pattern GET API
- Dataset Partition Pattern POST API
- Log Lineage Pattern GET API
- Log Lineage Pattern POST API
- Log Job Id Pattern GET API
- Log Job Id Pattern POST API
- Application GET API 1
- Application GET API 2
- Application GET API 3
- Application POST/PUT API
- Database GET API 1
- Database GET API 2
- Application GET API 3
- Database POST/PUT API
- ETL Job GET API 1
- ETL Job GET API 2
- ETL Job New Job POST API
- ETL Job Control PUT API
- ETL Job Schedule Update PUT API
- ETL Job Property Update PUT API
Table of Contents
- Dataset GET API
- Dataset POST API
- Downstream/Upstream Jobs GET API
- Downstream/Upstream Datasets GET API
- Job Lineage POST API
- Dataset Dependency GET API
- Flow Owner GET API
- Flow Schedule GET API
- Filename Pattern GET API
- Filename Pattern POST API
- Dataset Partition Pattern GET API
- Dataset Partition Pattern POST API
- Log Lineage Pattern GET API
- Log Lineage Pattern POST API
- Log Job Id Pattern GET API
- Log Job Id Pattern POST API
- Application GET API
- Application POST/PUT API
- Database GET API
- Database POST/PUT API
- ETL Job GET API
- ETL Job New Job POST API
- ETL Job Control PUT API
- ETL Job Schedule Update PUT API
- ETL Job Property Update PUT API
Dataset GET API
Return information about dataset by dataset ID or URN
-
URL
/dataset
-
Method:
GET
-
Data Params
Required:
Param Name | Description | Default | Required |
---|---|---|---|
datasetId | Dataset ID | Y |
or
Param Name | Description | Default | Required |
---|---|---|---|
urn | URN of dataset | Y |
- Success Response:
{
"return_code": "200",
"dataset": {
"id": 8011,
"name": "DUMMY4",
"schema": "{\"name\": \"DUMMY\", \"fields\": [{\"accessCount\": null, \"lastAccessTime\": null, \"nullable\": \"Y\", \"format\": null, \"type\": \"INT\", \"maxByteLength\": 4, \"name\": \"DUMMY\", \"doc\": \"\"}]}\",\"properties\" : \"{\"storage_type\": \"View\", \"accessCount\": 2670, \"lastAccessTime\": null, \"sizeInMbytes\": null, \"referenceTables\": [\"DWH_DIM.DUMMY\"], \"viewSqlText\": \"REPLACE VIEW DWH_STG.DUMMY AS\nLOCKING ROW FOR ACCESS\n SELECT * \n FROM DWH_DIM.DUMMY;\", \"createTime\": \"2015-03-06 10:43:58\", \"lastAlterTime\": \"2015-03-10 20:57:16\"}",
"properties": null,
"fields": "{\"DUMMY\": {\"type\": \"INT\", \"maxByteLength\": 4}}",
"urn": "teradata:///DWH_TMP/DUMMY4",
"source": "Teradata",
"modified": 1426003036000,
"created": null,
"location_prefix": "DWH_STG",
"parent_name": null,
"storage_type": null,
"ref_dataset_id": 1000000,
"status_id": 0,
"schema_type": "JSON",
"dataset_type": null,
"hive_serdes_class": null,
"is_partitioned": null,
"partition_layout_pattern_id": 0
}
}
- Error Response:
{
"return_code": "404",
"error_message": "dataset can not find!"
}
- Sample Call:
GET /dataset?datasetId=1
GET /dataset?urn=teradata:///DWH_TMP/DUMMY4
Dataset POST API
Post one dataset record into WhereHows metadata storage
-
URL
/dataset
-
Method:
POST
-
Data Params
Required:
{
"name" : "DUMMY4",
"urn" : "teradata:///DWH_TMP/DUMMY4",
"schema" : "{\"name\": \"DUMMY\", \"fields\": [{\"accessCount\": null, \"lastAccessTime\": null, \"nullable\": \"Y\", \"format\": null, \"type\": \"INT\", \"maxByteLength\": 4, \"name\": \"DUMMY\", \"doc\": \"\"}]}",
"properties" : "{\"storage_type\": \"View\", \"accessCount\": 2670, \"lastAccessTime\": null, \"sizeInMbytes\": null, \"referenceTables\": [\"DWH_DIM.DUMMY\"], \"viewSqlText\": \"REPLACE VIEW DWH_STG.DUMMY AS\nLOCKING ROW FOR ACCESS\n SELECT * \n FROM DWH_DIM.DUMMY;\", \"createTime\": \"2015-03-06 10:43:58\", \"lastAlterTime\": \"2015-03-10 20:57:16\"}",
"schema_type" : "JSON",
"fields" : "{\"DUMMY\": {\"type\": \"INT\", \"maxByteLength\": 4}}",
"source" : "Teradata",
"source_created_time" : null,
"location_prefix" : "DWH_STG",
"ref_dataset_urn" : null,
"is_partitioned" : "Y",
"sample_partition_full_path": null,
"parent_name" : null,
"storage_type" : null,
"dataset_type" : null,
"hive_serdes_class" : null
}
curl -H "Content-Type: application/json" -X POST -d '{"name" :"DUMMY4","urn" : "teradata:///DWH_TMP/DUMMY4","schema" : "{\"name\": \"DUMMY\", \"fields\": [{\"accessCount\": null, \"lastAccessTime\": null, \"nullable\": \"Y\", \"format\": null, \"type\": \"INT\", \"maxByteLength\": 4, \"name\": \"DUMMY\", \"doc\": \"\"}]}","properties" : "{\"storage_type\": \"View\", \"accessCount\": 2670, \"lastAccessTime\": null, \"sizeInMbytes\": null, \"referenceTables\": [\"DWH_DIM.DUMMY\"], \"viewSqlText\": \"REPLACE VIEW DWH_STG.DUMMY AS\nLOCKING ROW FOR ACCESS\n SELECT * \n FROM DWH_DIM.DUMMY;\", \"createTime\": \"2015-03-06 10:43:58\", \"lastAlterTime\": \"2015-03-10 20:57:16\"}","schema_type" : "JSON","fields" :"{\"DUMMY\": {\"type\": \"INT\", \"maxByteLength\": 4}}","source" : "Teradata","source_created_time" : null,"location_prefix" : "DWH_STG","ref_dataset_urn" : null,”is_partitioned" : "Y","sample_partition_full_path": null,"parent_name" : null,"storage_type" : null,"dataset_type" : null,"hive_serdes_class" : null}' http://localhost:19001/dataset
- Success Response:
{
"return_code": 200,
"message": "Dataset inserted!"
}
- Error Response:
{
"return_code": 404,
"error_message": "Column count doesn't match value count at row 1"
}
- Sample Call:
POST /dataset
Downstream/Upstream Jobs GET API
Return impact (downstream/upstream) jobs within specified days for a specified dataset
-
URL
/lineage/dataset
-
Method:
GET
-
Data Params
Param Name | Description | Default | Required |
---|---|---|---|
urn | URN of dataset | Y | |
cluster | Cluster instance | all clusters | N |
instance | Application instance | all instances | N |
direction | Downstream or upstream | downstream | N |
period | Historic days of the job execution | 30 | N |
- Success Response:
{
"return_code": 200,
"jobs": [
{
"short_connection_string": "eat1-nertzaz01",
"flow_group": "acceptance-tests",
"flow_name": "all-acceptance-tests",
"job_name": "pig-token-expired"
},
{
"short_connection_string": "eat1-nertzaz01",
"flow_group": "all-acceptance",
"flow_name": "all-acceptance-tests",
"job_name": "pig-token-expired"
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "exceptions"
}
- Sample Call:
GET /lineage/dataset?urn=hdfs:///data/databases/GEO/COUNTRIES
GET /lineage/dataset?urn=hdfs:///data/databases/GEO/COUNTRIES&direction=downstream&cluster=cluster01&instance=instance01
Downstream/Upstream Datasets GET API
Return impact (downstream/upstream) datasets for a specified job or job execution
-
URL
/lineage/job
-
Method:
GET
-
Data Params
Required:
Param Name | Description | Default | Required |
---|---|---|---|
flowPath | project:flow(azkaban); app path(oozie) | Y | |
jobName | Job name | Y |
or
Param Name | Description | Default | Required |
---|---|---|---|
flowExecId | Flow execution ID | Y | |
jobName | Job name | Y |
or
Param Name | Description | Default | Required |
---|---|---|---|
jobExecId | Job execution ID | Y |
Optional:
Param Name | Description | Default | Required |
---|---|---|---|
instance | Application instance | all instances | N |
direction | Downstream or upstream | downstream | N |
- Success Response:
{
"return_code": 200,
"datasets": [
{
"abstracted_object_name": "/data/tracking/UnifiedActionEvent",
"database_id": 14,
"partition_start": "2015/10/15/15",
"partition_end": "2015/10/15/15",
"storage_type": "HDFS",
"record_count": null,
"insert_count": null,
"update_count": null,
"delete_count": null
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /lineage/job?flowPath=b2-backend:ProduceActionDedupRecord&jobName=ProduceActionDedupRecord&direction=upstream
GET /lineage/job/exec/flow?flowExecId=706130&jobName=prod-bs-decorate-bare-applies
GET /lineage/job/exec/job?jobExecId=706130001
Job Lineage POST API
Post one job lineage records into WhereHows metadata storage
-
URL
/lineage
-
Method:
POST
-
Data Params
Required:
{
"app_id" : 310,
"app_name" : "ltx1-tarockaz01",
"flow_exec_id" : 928242,
"job_exec_id" : 928242001,
"job_exec_uuid" : null,
"job_name" : "test_lineage",
"job_start_unixtime" : 1440000000,
"job_end_unixtime" : 1440000000,
"flow_path" : "test_post:test_lineage",
"lineages" : [
{
"db_id" : 1000,
"abstracted_object_name" : "/data/test-lineage-source",
"full_object_name" : "hdfs:///cluster:port/data/test-lineage-source/1444111922033-PT-262022594",
"storage_type" : "hdfs",
"partition_start" : "1444111922033-PT-262022594",
"partition_end" : "1444111922033-PT-262022594",
"partition_type" : "snapshot",
"layout_id" : 0,
"source_target_type" : "source",
"operation" : "read",
"record_count" : 500000,
"insert_count" : 0,
"delete_count" : 0,
"update_count" : 0
},
{
"db_id" : 490,
"abstracted_object_name" : "data/databases/abc/source1",
"full_object_name" : "hdfs:///cluster:port/data/databases/abc/source1/2015090822",
"storage_type" : "hdfs",
"partition_start" : "2015090822",
"partition_end" : "2015090822",
"partition_type" : "hourly",
"layout_id" : 0,
"source_target_type" : "source",
"operation" : "read",
"record_count" : 3000,
"insert_count" : 0,
"delete_count" : 0,
"update_count" : 0
},
{
"db_id" : 1000,
"abstracted_object_name" : "/data/test-lineage-target",
"full_object_name" : "hdfs:///cluster/data/test-lineage-target/20150909",
"storage_type" : "hdfs",
"partition_start" : "20150909",
"partition_end" : "20150909",
"partition_type" : "daily",
"layout_id" : 0,
"source_target_type" : "target",
"ref_source_object_name" : "/data/databases/abc/source1",
"ref_source_database_id" : 490,
"operation" : "write",
"record_count" : 100000,
"insert_count" : 0,
"delete_count" : 0,
"update_count" : 0
}
]
}
- Success Response:
{
"return_code": 200,
"message": "Lineage inserted!"
}
- Error Response:
{
"return_code": 404,
"error_message": "Something is wrong"
}
- Sample Call:
POST /lineage
Dataset Dependency GET API
API to Resolve Hive View (Dalids) and Teradata View Dependency
-
URL
/dependency/dataset?query=$json
-
Method:
GET
-
Data Params
We support multiple types of input parameter, if cluster name is skipped, default value is ltx1-holdem
Required:
{
"dataset_uri":"db_name/table_name",
"cluster_name": "CLUSTER_NAME"
}
or
{
"dataset_uri":"hive:///db_name/table_name",
"cluster_name": "CLUSTER_NAME"
}
or
{
"dataset_uri":"dalids:///db_name/table_name",
"cluster_name": "CLUSTER_NAME"
}
or
{
"dataset_uri":"cluster_name:db_name.table_name"
}
or
{
"dataset_uri":"hive://cluster_name/db_name/table_name"
}
or
{
"dataset_uri":"dalids://cluster_name/db_name.table_name"
}
- Success Response:
{
"return_code":200,
"deployment_tier":"grid",
"data_center":"eat1",
"cluster":"eat1-nertz",
"dataset_type":"Dalids",
"database_name":"feedimpressionevent_mp",
"table_name":"feedimpressionevent",
"urn":"dalids:///feedimpressionevent_mp/feedimpressionevent",
"dataset_id":192429,
"input_uri":"dalids://eat1-nertz/feedimpressionevent_mp/feedimpressionevent",
"dependencies": [
{
"dataset_id":164903,
"database_name":"tracking",
"type":"Table",
"table_name":"feedimpressionevent",
"level_from_root":2,
"next_level_dependency_count":0,
"topology_sort_id":"100100",
"ref_obj_type":"hive",
"ref_obj_location":"/tracking/feedimpressionevent",
"high_watermark":null
},
{
"dataset_id":192434,
"database_name":"feedimpressionevent_mp_versioned",
"type":"View",
"table_name":"feedimpressionevent_0_1_8",
"level_from_root":1,
"next_level_dependency_count":1,
"topology_sort_id":"100",
"ref_obj_type":"dalids",
"ref_obj_location":"/feedimpressionevent_mp_versioned/feedimpressionevent_0_1_8",
"high_watermark":null
}
],
"leaf_level_dependency_count":1}
- Error Response:
{
"return_code": 404,
"error_message": "Wrong input format! Missing dataset uri"
}
- Sample Call:
GET /dependency/dataset?query={"dataset_uri":"feedimpressionevent_mp.feedimpressionevent", "cluster_name": "eat1-nertz"}
Flow Owner GET API
Return owner information of a flow
-
URL
/flow/owner
-
Method:
GET
-
Data Params
Param Name | Description | Default | Required |
---|---|---|---|
flowPath | project:flow(azkaban); app path(oozie) | Y | |
instance | Application instance | all instances | N |
- Success Response:
{
"return_code": 200,
"owners": [
{
"instance": "eat1-nertzaz01",
"owner_id": "pgovinda",
"permissions": "ADMIN",
"owner_type": "LDAP"
},
{
"instance": "lva1-waraz01",
"owner_id": "elephant",
"permissions": "ADMIN",
"owner_type": "LDAP"
},
{
"instance": "lva1-waraz01",
"owner_id": "pgovinda",
"permissions": "ADMIN",
"owner_type": "LDAP"
}
]
}
- Error Response:
{
"return_code": 404,
"error_message": "Something is wrong"
}
- Sample Call:
GET /flow/owner?flowPath=job-analytics:prod-bs-push-to-offline-cluster
Flow Schedule GET API
Return schedule information of a flow
-
URL
/flow/schedule
-
Method:
GET
-
Data Params
Param Name | Description | Default | Required |
---|---|---|---|
flowPath | project:flow(azkaban); app path(oozie) | Y | |
instance | Application instance | all instances | N |
- Success Response:
{
"return_code": 200,
"schedules": [
{
"instance": "lva1-waraz01",
"frequency": 1,
"unit": "DAY",
"effective_start_time": 1438416000,
"effective_end_time": 4102387200
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "Something is wrong"
}
- Sample Call:
GET /flow/schedule?flowPath=job-analytics:prod-bs-push-to-offline-cluster
Filename Pattern GET API
Get all filename patterns
-
URL
/pattern/filename
-
Method:
GET
-
Success Response:
{
"return_code": 200,
"filename_patterns": [
{
"filename_pattern_id": 1,
"regex": "(.*)/part-m-\d+\.avro"
},
{
"filename_pattern_id": 2,
"regex": "(.*)/part-r-\d+\.avro"
},
{
"filename_pattern_id": 3,
"regex": "(.*)/part-\d+\.avro"
},
{
"filename_pattern_id": 4,
"regex": "(.*)/part-\d+"
},
{
"filename_pattern_id": 5,
"regex": "(.*)/part-m-\d+"
},
{
"filename_pattern_id": 6,
"regex": "(.*)/part-r-\d+"
},
{
"filename_pattern_id": 7,
"regex": "(.*)/\d+\.avro"
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "Something is wrong"
}
- Sample Call:
GET /pattern/filename
Filename Pattern POST API
Add a new filename pattern
-
URL
/pattern/filename
-
Method:
POST
-
Data Params
Request body in json format:
Param Name | Description | Default | Required |
---|---|---|---|
regex | regular expression of filename | Y |
Example
{
"regex": "(.*)/partition-m-\d+\.avro"
}
- Success Response:
{
"return_code": 200,
"message": "New filename pattern created!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
POST /pattern/filename
with request body:
{
"regex": "(.*)/partition-m-\d+\.avro"
}
Dataset Partition Pattern GET API
Get all dataset partition patterns
-
URL
/pattern/partiton
-
Method:
GET
-
Success Response:
{
"return_code": 200,
"dataset_partition_patterns": [
{
"layout_id": 1,
"regex": "(.*)\/daily\/(\d{4}\/\d{2}\/\d{2}).*",
"mask": "daily/yyyy/MM/dd",
"leading_path_index": 1,
"partition_index": 2,
"second_partition_index": null,
"sort_id": 1,
"comments": null,
"partition_pattern_group": "daily"
},
{
"layout_id": 2,
"regex": "(.*)\/hourly\/(\d{4}-\d{2}-\d{2}-\d{2}).*",
"mask": "hourly/yyyy-MM-dd-HH",
"leading_path_index": 1,
"partition_index": 2,
"second_partition_index": null,
"sort_id": 2,
"comments": null,
"partition_pattern_group": "hourly"
},
{
"layout_id": 3,
"regex": "(.*)\/yearly\/(\d{4}).*",
"mask": "yearly/YYYY",
"leading_path_index": 1,
"partition_index": 2,
"second_partition_index": null,
"sort_id": 3,
"comments": null,
"partition_pattern_group": "yearly"
},
{
"layout_id": 4,
"regex": "(.*)\/monthly\/(\d{4}\/\d{2}).*",
"mask": "monthly/yyyy/MM",
"leading_path_index": 1,
"partition_index": 2,
"second_partition_index": null,
"sort_id": 10,
"comments": null,
"partition_pattern_group": "monthly"
},
{
"layout_id": 5,
"regex": "(.*)\/weekly\/(\d{4}\/\d{2}).*",
"mask": "weekly/YYYY/MM",
"leading_path_index": 1,
"partition_index": 2,
"second_partition_index": null,
"sort_id": 12,
"comments": null,
"partition_pattern_group": "weekly"
},
{
"layout_id": 6,
"regex": "(.*)\/(\d+-(PT|scn)-\d+)(\/.*)?",
"mask": "epoch-PT|SCN-recordcount",
"leading_path_index": 1,
"partition_index": 2,
"second_partition_index": null,
"sort_id": 20,
"comments": null,
"partition_pattern_group": "snapshot"
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "Something is wrong"
}
- Sample Call:
GET /pattern/partition
Dataset Partition Pattern POST API
Add a new dataset partition patterns
-
URL
/pattern/partition
-
Method:
POST
-
Data Params
Request body in JSON format:
Param Name | Description | Default | Required |
---|---|---|---|
regex | regular expression of partition | Y | |
mask | mask of the partition | N | |
leading_path_index | group index of abstract path in the regex | Y | |
partition_index | group index of partition in the regex | Y | |
second_partition_index | group index of second partition in the regex | null | N |
sort_id | order ID that will apply the regex | AUTO incremental | Y |
partition_pattern_group | pattern type | ||
comments | comments | null | N |
Example
{
"regex": "(.*)/daily/(d{4}/d{2}/d{2}).*",
"mask": "daily/yyyy/MM/dd",
"leading_path_index": 1,
"partition_index": 2,
"second_partition_index": null,
"sort_id": 1000000,
"comments": null,
"partition_pattern_group": "daily"
}
- Success Response:
{
"return_code": 200,
"message": "New dataset partition pattern created!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
POST /pattern/partition
with request body:
{
"regex": "(.*)/daily/(d{4}/d{2}/d{2}).*",
"mask": "daily/yyyy/MM/dd",
"leading_path_index": 1,
"partition_index": 2,
"second_partition_index": null,
"sort_id": 1000000,
"comments": null,
"partition_pattern_group": "daily"
}
Log Lineage Pattern GET API
Get all log lineage patterns
-
URL
/pattern/lineage
-
Method:
GET
-
Success Response:
{
"return_code": 200,
"lineage_patterns": [
{
"pattern_id": 1,
"pattern_type": "hourglass",
"regex": "Moving from staged path\[\S*\] to final resting place\[(\/(?!tmp)[\S]*)\]",
"database_type": "hdfs",
"database_name_index": null,
"dataset_index": 1,
"operation_type": "write",
"record_count_index": null,
"record_byte_index": null,
"insert_count_index": null,
"insert_byte_index": null,
"delete_count_index": null,
"delete_byte_index": null,
"update_count_index": null,
"update_byte_index": null,
"comments": "hourglass",
"source_target_type": "target"
},
{
"pattern_id": 2,
"pattern_type": "hourglass",
"regex": "Moving \S* to (\/(?!tmp)[\S]*)",
"database_type": "hdfs",
"database_name_index": null,
"dataset_index": 1,
"operation_type": "write",
"record_count_index": null,
"record_byte_index": null,
"insert_count_index": null,
"insert_byte_index": null,
"delete_count_index": null,
"delete_byte_index": null,
"update_count_index": null,
"update_byte_index": null,
"comments": "hourglass",
"source_target_type": "target"
},
{
"pattern_id": 3,
"pattern_type": "build and push",
"regex": "Pushing to clusterURl(tcp:\S*)[\s\S]*Initiating swap of (\S*) with dataDir",
"database_type": "voldermont",
"database_name_index": 1,
"dataset_index": 2,
"operation_type": "push",
"record_count_index": null,
"record_byte_index": null,
"insert_count_index": null,
"insert_byte_index": null,
"delete_count_index": null,
"delete_byte_index": null,
"update_count_index": null,
"update_byte_index": null,
"comments": "build and push voldermont",
"source_target_type": "target"
},
{
"pattern_id": 4,
"pattern_type": "pig log",
"regex": "Successfully read (\d+) records( \(\d+ bytes\))? from: \"(.*)\"",
"database_type": "hdfs",
"database_name_index": null,
"dataset_index": 3,
"operation_type": "read",
"record_count_index": 1,
"record_byte_index": 2,
"insert_count_index": null,
"insert_byte_index": null,
"delete_count_index": null,
"delete_byte_index": null,
"update_count_index": null,
"update_byte_index": null,
"comments": "pig log",
"source_target_type": "source"
},
{
"pattern_id": 5,
"pattern_type": "pig log",
"regex": "Successfully stored (\d+) records( \(\d+ bytes\))? in: \"(.*)\"",
"database_type": "hdfs",
"database_name_index": null,
"dataset_index": 3,
"operation_type": "write",
"record_count_index": 1,
"record_byte_index": 2,
"insert_count_index": null,
"insert_byte_index": null,
"delete_count_index": null,
"delete_byte_index": null,
"update_count_index": null,
"update_byte_index": null,
"comments": "pig log",
"source_target_type": "target"
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "Something is wrong"
}
- Sample Call:
GET /pattern/lineage
Log Lineage Pattern POST API
Add a new lineage pattern
-
URL
/pattern/lineage
-
Method:
POST
-
Data Params
Request body in JSON format:
Param Name | Description | Default | Required |
---|---|---|---|
regex | regular expression of lineage | Y | |
pattern_type | type of log | N | |
database_type | e.g., HDFS, Voldemort | N | |
operation_type | e.g., write, read | ||
source_target_type | Y | ||
dataset_index | dataset group index in the regex | null | N |
database_name_index | database name group index in the regex | null | N |
record_count_index | all kinds of record count group index in the regex | null | N |
record_byte_index | all kinds of record byte group index in the regex | null | N |
insert_count_index | insert record count group index in the regex | null | N |
insert_byte_index | insert byte group index in the regex | null | N |
delete_count_index | delete record count group index in the regex | null | N |
delete_byte_index | delete byte group index in the regex | null | N |
update_count_index | update record count group index in the regex | null | N |
update_byte_index | update byte group index in the regex | null | N |
comments | comments | null | N |
Example
{
"pattern_type": "pig log",
"regex": "Successfully read (d+) records( (d+ bytes))? from: \"(.*)\"",
"database_type": "hdfs",
"database_name_index": null,
"dataset_index": 3,
"operation_type": "read",
"record_count_index": 1,
"record_byte_index": 2,
"insert_count_index": null,
"insert_byte_index": null,
"delete_count_index": null,
"delete_byte_index": null,
"update_count_index": null,
"update_byte_index": null,
"comments": "pig log",
"source_target_type": "source"
}
- Success Response:
{
"return_code": 200,
"message": "New lineage pattern created!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
POST /pattern/lineage
with request body:
{
"pattern_type": "pig log",
"regex": "Successfully read (d+) records( (d+ bytes))? from: \"(.*)\"",
"database_type": "hdfs",
"database_name_index": null,
"dataset_index": 3,
"operation_type": "read",
"record_count_index": 1,
"record_byte_index": 2,
"insert_count_index": null,
"insert_byte_index": null,
"delete_count_index": null,
"delete_byte_index": null,
"update_count_index": null,
"update_byte_index": null,
"comments": "pig log",
"source_target_type": "source"
}
Log Job Id Pattern GET API
Get all log job ID patterns
-
URL
/pattern/jobid
-
Method:
GET
-
Success Response:
{
"return_code": 200,
"job_id_patterns": [
{
"pattern_id": 1,
"pattern_type": "pig",
"regex": "HadoopJobId: (job_\d+_\d+)",
"reference_job_id_index": 1,
"is_active": true,
"comments": "pig sample : HadoopJobId: job_1443068642861_495061"
},
{
"pattern_id": 2,
"pattern_type": "hive",
"regex": "INFO - Starting Job = (job_\d+_\d+)",
"reference_job_id_index": 1,
"is_active": true,
"comments": "hive sample : Starting Job = job_1443068642861_495047"
},
{
"pattern_id": 3,
"pattern_type": "mapreduce",
"regex": "Job (job_\d+_\d+) completed successfully",
"reference_job_id_index": 1,
"is_active": true,
"comments": "Job job_1440264275625_235896 completed successfully"
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "Something is wrong"
}
- Sample Call:
GET /pattern/jobid
Log Job Id Pattern POST API
Add a new log job ID pattern
-
URL
/pattern/jobid
-
Method:
POST
-
Data Params
Request body in json format:
Param Name | Description | Default | Required |
---|---|---|---|
regex | regular expression of job ID in log | Y | |
pattern_type | type | N | |
reference_job_id_index | group index of reference job in regex | Y | |
is_active | active flag | 1 | N |
comments | comments | null | N |
Example
{
"pattern_type": "pig",
"regex": "HadoopJobId: (job_d+_d+)",
"reference_job_id_index": 1,
"is_active": true,
"comments": "pig sample : HadoopJobId: job_1443068642861_495061"
}
- Success Response:
{
"return_code": 200,
"message": "New job id pattern created!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
POST /pattern/jobid
with request body:
{
"pattern_type": "pig",
"regex": "HadoopJobId: (job_d+_d+)",
"reference_job_id_index": 1,
"is_active": true,
"comments": "pig sample : HadoopJobId: job_1443068642861_495061"
}
Application GET API 1
Get all applications information
-
URL
/cfg/apps
-
Method:
GET
-
Success Response:
{
"return_code": 200,
"applications": [
{
"app_id": 102,
"app_code": "OOZIE VM",
"description": "VM",
"tech_matrix_id": 19,
"doc_url": null,
"parent_app_id": 100,
"app_status": "N",
"last_modified": 1446099261000,
"is_logical": "N",
"uri_type": null,
"uri": "http://172.21.98.60:11000/",
"lifecycle_layer_id": null,
"short_connection_string": "oozie-vm"
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /cfg/apps
Application GET API 2
Get application information by application ID
-
URL
/cfg/app/id
-
Method:
GET
-
Data Params
Required:
Param Name | Description | Default | Required |
---|---|---|---|
id | application ID | Y |
- Success Response:
{
"return_code": 200,
"application": {
"app_id": 102,
"app_code": "OOZIE VM",
"description": "VM",
"tech_matrix_id": 19,
"doc_url": null,
"parent_app_id": 100,
"app_status": "N",
"last_modified": 1446099261000,
"is_logical": "N",
"uri_type": null,
"uri": "http://172.21.98.60:11000/",
"lifecycle_layer_id": null,
"short_connection_string": "oozie-vm"
}
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /cfg/app/id?id=102
Application GET API 3
Get application information by application short connection string name
-
URL
/cfg/app/name
-
Method:
GET
-
Data Params
Required:
Param Name | Description | Default | Required |
---|---|---|---|
name | short connection string name | Y |
- Success Response:
{
"return_code": 200,
"application": {
"app_id": 102,
"app_code": "OOZIE VM",
"description": "VM",
"tech_matrix_id": 19,
"doc_url": null,
"parent_app_id": 100,
"app_status": "N",
"last_modified": 1446099261000,
"is_logical": "N",
"uri_type": null,
"uri": "http://172.21.98.60:11000/",
"lifecycle_layer_id": null,
"short_connection_string": "oozie-vm"
}
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /cfg/app/name?name=oozie-vm
Application POST/PUT API
Add or update an application
-
URL
/cfg/app
-
Method:
POST
for add a new application orPUT
for update an existing application -
Data Params
Request body in JSON format:
Param Name | Description | Default | Required |
---|---|---|---|
app_id | User assigned a unique application ID | Y | |
app_code | Application code | Y | |
description | Description of this application | Y | |
tech_matrix_id | matrix id it belong to | N | |
parent_app_id | the parent application ID, for example the Hadoop cluster application ID | 0 | N |
app_status | application status | A | N |
uri | URI of the application | null | N |
short_connection_string | unique short connection string | Y |
Example
{
"app_id": 103,
"app_code": "OOZIE VM",
"description": "VM",
"parent_app_id": 100,
"app_status": "A",
"uri": "http://172.21.98.60:11000/",
"short_connection_string": "oozie-vm"
}
- Success Response:
{
"return_code": 200,
"message": "New application created!"
}
or
{
"return_code": 200,
"message": "Application updated!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
POST /cfg/app
or
PUT /cfg/app
with request body:
{
"app_id": 103,
"app_code": "OOZIE VM",
"description": "VM",
"parent_app_id": 100,
"app_status": "A",
"uri": "http://172.21.98.60:11000/",
"short_connection_string": "oozie-vm"
}
Database GET API 1
Get all databases information
-
URL
/cfg/dbs
-
Method:
GET
-
Success Response:
{
"return_code": 200,
"databases": [
{
"db_id": 10001,
"db_code": "SAMPLETD",
"db_type_id": 0,
"description": "TERADATA VM",
"is_logical": "N",
"cluster_size": 23,
"associated_data_centers": 1,
"replication_role": "MASTER",
"jdbc_url": null,
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-TD",
"last_modified": 1446099112000
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /cfg/dbs
Database GET API 2
Get database information by database ID
-
URL
/cfg/db/id
-
Method:
GET
-
Data Params
Required:
Param Name | Description | Default | Required |
---|---|---|---|
id | database ID | Y |
- Success Response:
{
"return_code": 200,
"database": {
"db_id": 10001,
"db_code": "SAMPLETD",
"db_type_id": 0,
"description": "TERADATA VM",
"is_logical": "N",
"cluster_size": 23,
"associated_data_centers": 1,
"replication_role": "MASTER",
"jdbc_url": null,
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-TD",
"last_modified": 1446099112000
}
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /cfg/db/id?id=10001
Application GET API 3
Get application information by application short connection string name
-
URL
/cfg/db/name
-
Method:
GET
-
Data Params
Required:
Param Name | Description | Default | Required |
---|---|---|---|
name | short connection string name | Y |
- Success Response:
{
"return_code": 200,
"database": {
"db_id": 10001,
"db_code": "SAMPLETD",
"db_type_id": 0,
"description": "TERADATA VM",
"is_logical": "N",
"cluster_size": 23,
"associated_data_centers": 1,
"replication_role": "MASTER",
"jdbc_url": null,
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-TD",
"last_modified": 1446099112000
}
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /cfg/db/name?name=SAMPLE-TD
Database POST/PUT API
-
URL
/cfg/db
-
Method:
POST
for add a new database orPUT
for update an existing database -
Data Params
Request body in JSON format:
Param Name | Description | Default | Required |
---|---|---|---|
db_id | User assigned a unique application ID | Y | |
db_code | Database code | Y | |
primary_dataset_type | Database type | null | N |
description | Description of this database | Y | |
cluster_size | Size of the cluster | 0 | N |
short_connection_string | unique short connection string | Y | |
associated_dc_num | Number of associated data centers | 0 | N |
replication_role | MASTER or SLAVE or BOTH or Primary | null | N |
uri | URI of the application | null | N |
Example
{
"db_id": 10001,
"db_code": "SAMPLETD",
"primary_dataset_type": null,
"description": "TERADATA VM",
"cluster_size": 22,
"associated_dc_num": 1,
"replication_role": "MASTER",
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-TD"
}
- Success Response:
{
"return_code": 200,
"message": "New database created!"
}
or
{
"return_code": 200,
"message": "Database updated!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
POST /cfg/db
or
PUT /cfg/db
with request body:
{
"db_id": 10001,
"db_code": "SAMPLETD",
"description": "TERADATA VM",
"cluster_size": 22,
"associated_data_centers": 1,
"replication_role": "MASTER",
"uri": "Teradata://sample-td",
"short_connection_string": "SAMPLE-TD"
}
ETL Job GET API 1
Get all ETL jobs information
-
URL
/etls
-
Method:
GET
-
Success Response:
{
"return_code": 200,
"etl_jobs": [
{
"wh_etl_job_id": 19,
"wh_etl_job_name": "OOZIE_EXECUTION_METADATA_ETL",
"wh_etl_type": "OPERATION",
"cron_expr": "0 12 * * * ?",
"ref_id": 102,
"timeout": null,
"next_run": null,
"comments": "oozie vm metadata etl",
"ref_id_type": "APP",
"is_active": "Y"
}
]
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /etls
ETL Job GET API 2
Return information about dataset by dataset ID or URN
-
URL
/etl
-
Method:
GET
-
Data Params
Required:
Param Name | Description | Default | Required |
---|---|---|---|
id | ETL job ID | Y |
- Success Response:
{
"return_code": 200,
"etl_job": {
"wh_etl_job_id": 1,
"wh_etl_job_name": "AZKABAN_EXECUTION_METADATA_ETL",
"wh_etl_type": "OPERATION",
"cron_expr": "* 0 * * * ?",
"ref_id": 31,
"timeout": null,
"next_run": null,
"comments": null,
"ref_id_type": "APP",
"is_active": "Y"
}
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
GET /etl?id=1
ETL Job New Job POST API
Submit a new scheduled ETL job
-
URL
/etl
-
Method:
POST
-
Data Params
Request body in JSON format:
Param Name | Description | Default | Required |
---|---|---|---|
wh_etl_job_name | WhereHows ETL job name | Y | |
ref_id | Application or database ID | Y | |
cron_expr | cron expression of the job schedule | Y | |
properties | key-value pair properties of the job | Y | |
encrypted_property_keys | keys that need encrypted | N | |
timeout | timeout for the job | N | |
next_run | next runtime in Unix time | now | N |
comments | user comments | Y |
Example
{
"wh_etl_job_name": "OOZIE_EXECUTION_METADATA_ETL",
"ref_id": 102,
"cron_expr": "* 11 * * * ?",
"properties": {"oz.db.username": "root",
"oz.db.password": "cloudera",
"oz.db.jdbc.url": "jdbc:mysql://host:port/oozie",
"oz.db.driver": "com.mysql.jdbc.Driver",
"oz.exec_etl.lookback_period.in.minutes": "1000"
},
"encrypted_property_keys": ["oz.db.password"],
"timeout": null,
"next_run": null,
"comments": "oozie vm metadata etl"
}
- Success Response:
{
"return_code": 200,
"message": "Etl job created!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
POST /etl
with request body:
{
"wh_etl_job_name": "OOZIE_EXECUTION_METADATA_ETL",
"ref_id": 102,
"cron_expr": "* 11 * * * ?",
"properties": {"oz.db.username": "root",
"oz.db.password": "cloudera",
"oz.db.jdbc.url": "jdbc:mysql://host:port/oozie",
"oz.db.driver": "com.mysql.jdbc.Driver",
"oz.exec_etl.lookback_period.in.minutes": "1000"
},
"encrypted_property_keys": ["oz.db.password"],
"timeout": null,
"next_run": null,
"comments": "oozie vm metadata etl"
}
ETL Job Control PUT API
Deactivate a previous scheduled ETL job or reactivate a stopped ETL job or delete an ETL job
-
URL
/etl/control
-
Method:
PUT
-
Data Params
Request body in JSON format:
Param Name | Description | Default | Required |
---|---|---|---|
wh_etl_job_name | WhereHows ETL job name | Y | |
ref_id | Application or database ID | Y | |
control | deactivate or activate or delete | Y |
Example
{
"wh_etl_job_name": "OOZIE_EXECUTION_METADATA_ETL",
"ref_id": 102,
"control": "deactivate"
}
- Success Response:
{
"return_code": 200,
"message": "Etl job disabled!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
PUT /etl/control
with request body:
{
"wh_etl_job_name": "OOZIE_EXECUTION_METADATA_ETL",
"ref_id": 102,
"control": "deactivate"
}
ETL Job Schedule Update PUT API
Update an ETL job schedule
-
URL
/etl/schedule
-
Method:
PUT
-
Data Params
Request body in JSON format:
Param Name | Description | Default | Required |
---|---|---|---|
wh_etl_job_name | WhereHows ETL job name | Y | |
ref_id | Application or database ID | Y | |
cron_expr | deactivate or activate or delete | Y |
Example
{
"wh_etl_job_name": "OOZIE_EXECUTION_METADATA_ETL",
"ref_id": 102,
"cron_expr": "0 12 * * * ?"
}
- Success Response:
{
"return_code": 200,
"message": "Etl job schedule updated!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
curl -H "Content-Type: application/json" -d '{"wh_etl_job_name":"OOZIE_EXECUTION_METADATA_ETL","ref_id":102,"cron_expr":"0 12 * * * ?"}' http://localhost:19001/etl/schedule -X PUT
ETL Job Property Update PUT API
Update a job property
-
URL
/etl/property
-
Method:
PUT
-
Data Params
Request body in JSON format:
Param Name | Description | Default | Required |
---|---|---|---|
wh_etl_job_name | WhereHows ETL job name | Y | |
ref_id | Application or database ID | Y | |
property_name | name of the property | Y | |
property_value | value of the property | Y | |
need_encrypted | true or false | N |
* If need_encrypted=true, an encryption key will be fetched from the location specified in ETL property table by property_name=wherehows.encrypt.master.key.loc or the default location at ~/.wherehows/master_key.
Usually this option is set to true when storing passwords or connection tokens. This encryption key file is not checked in to GitHub or included in Play distribution zip file. Please maintain this file manually and keep the file system permission as rw------- or 0600
Example:
{
"wh_etl_job_name": "OOZIE_EXECUTION_METADATA_ETL",
"ref_id": 102,
"property_name": "oz.exec_etl.lookback_period.in.minutes",
"property_value": "1001",
"need_encrypted": true
}
- Success Response:
{
"return_code": 200,
"message": "Etl job schedule updated!"
}
- Error Response:
{
"return_code": "404",
"error_message": "something is wrong"
}
- Sample Call:
curl -H "Content-Type: application/json" -X PUT -d '{"wh_etl_job_name": "OOZIE_EXECUTION_METADATA_ETL", "ref_id": 102, "property_name": "oz.exec_etl.lookback_period.in.minutes", "property_value": "1001", "need_encrypted": true}' http://localhost:19001/etl/property