datahub

yujunjun/datahub

Fork 0

mirror of https://github.com/datahub-project/datahub.git synced 2025-06-27 05:03:31 +00:00

Table of Contents

Configuration
Extract
Transform
Load

This doc is for older versions (v0.2.1 and before) of WhereHows. Please refer to this for the latest version.

Collect dataset metadata from Oracle DB.

Configuration

List of properties in the wh_etl_job_property table that are required for the Oracle dataset ETL process:

configuration key	description
oracle.db.driver	oracle database jdbc driver
oracle.db.name	oracle database name
oracle.db.username	oracle database user name
oracle.db.password	oracle database password
oracle.db.jdbc.url	oracle database jdbc url
oracle.metadata	local file location to store the oracle datasets metadata csv file
oracle.field_metadata	local file location to store the oracle dataset fields csv file
oracle.sample_data	local file location to store the oracle dataset sample data csv file
oracle.load_sample	true/false whether to get sample data
oracle.exclude_db	list of excluded databases in oracle

Extract

Major related file: OracleExtract.py

Connect to Oracle database to get all the table/column/comments information excluding the databases in the exclude list. Extra table information including indices, constraints and partitions are also fetched. The results are formatted and stored in two CSV files, one for table records and the other for field records.

Major source tables: ALL_TABLES, ALL_TAB_COLUMNS, ALL_COL_COMMENTS, ALL_INDEXES, ALL_IND_COLUMNS, ALL_CONSTRAINTS, ALL_PART_KEY_COLUMNS

Transform

Not needed.

Load

Major related file: OracleLoad.py

Load into MySQL database, similar to HiveLoad or HdfsLoad.

Related tables: dict_dataset

User Guide

Developer Guide

Metadata ETL Types

A LinkedIn Product
Tech Blog | Jobs