Grant Nicholas fa58c2d161
fix(metadata-ingestion): Fix auditStamp unix timestamp format in sql etl ingestion (#1918)
Datahub was expecting this timestamp to be in milliseconds since epoch, not seconds. This change makes the lastModified timestamp render correctly in the UI when it is converted to a date time string.
2020-10-06 11:13:02 -07:00
..
2020-06-25 16:32:22 -07:00

SQL-Based Metadata Ingestion

This directory contains example ETL scripts that use SQLAlchemy to ingest basic metadata from a wide range of commonly used SQL-based data systems, including MySQL, PostgreSQL, Oracle, MS SQL, Redshift, BigQuery, Snowflake, etc.

Requirements

You'll need to install both the common requirements (common.txt) and the system-specific driver for the script (e.g. mysql_etl.txt for mysql_etl.py). Some drivers also require additional dependencies to be installed so please check the driver's official project page for more details.

Example

Here's an example on how to ingest metadata from MySQL.

Install requirements

pip install --user -r common.txt -r mysql_etl.txt

Modify these variables in mysql_etl.py to match your environment

URL       # Connection URL in the form of mysql+pymysql://username:password@hostname:port
OPTIONS   # Additional conenction options for the driver

Run the ETL script

python mysql_etl.py