mirror of
https://github.com/datahub-project/datahub.git
synced 2025-06-27 05:03:31 +00:00
Page:
Teradata Dataset
Pages
Architecture
Azkaban Execution
Backend API
Coding Guidelines
Frontend API
Frontend User Guide
Getting Started
Git Commit Metadata ETL
Gobblin Kafka Integeration
Gobblin Kafka Integration
HDFS Dataset Ownership Metadata ETL
HDFS Dataset
Hive Dataset
Home
Integration Guide
Kafka Integration Guide
LDAP Directory Metadata ETL
Lineage
Multiproduct Metadata ETL
Oozie Execution
Oracle Dataset
Quick Start with VM
Set Up New Metadata ETL Jobs
Teradata Dataset
Clone
8
Teradata Dataset
Mars Lan edited this page 2017-07-26 16:48:08 -07:00
Table of Contents
This doc is for older versions (v0.2.1 and before) of WhereHows. Please refer to this for the latest version.
Collect dataset metadata from Teradata.
configuration
List of properties required for the ETL process:
configuration key | description |
---|---|
teradata.databases | the databases that will be collected, comma separated, e.g., financial,manufacturing |
teradata.db.driver | Teradata database driver, e.g., com.teradata.jdbc.TeraDriver |
teradata.db.jdbc.url | Teradata database JDBC URL (not including username and password), e.g., jdbc:teradata://localhost/DBS_PORT=1025 |
teradata.db.username | Teradata database username |
teradata.db.password | Teradata database password |
teradata.default_database | default database for connection |
teradata.field_metadata | place to store the field metadata CSV file, e.g., var/tmp/wherehows/field_metadata |
teradata.log | place to store the log file, e.g., /var/tmp/wherehows/teradata_log |
teradata.metadata | place to store metadata CSV file |
teradata.sample.skip.list | the tables you want to skip for sample data collecting (e.g., for security reasons) |
teradata.sample_output | place to store sample data file |
teradata.schema_output | place to store schema data file |
Extract
Major related file: TeradataExtract.py
Get metadata from Teradata DBC databases and store it in a local JSON file.
Major source tables: DBC.Tables, DBC.Columns, DBC.ColumnStatsV, DBC.TableSize, DBC.Indieces, DBC.IndexConstraints
'Select top 10' to get sample data.
Transform
Major related file: TeradataTransform.py
Transform the JSON output into CSV format for easy loading.
Load
Major related file: TeradataLoad.py
Load into MySQL database.
Related tables: dict_dataset, dict_field_detail, dict_dataset_sample