mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-11-04 04:39:10 +00:00 
			
		
		
		
	Start adding java ETL examples, starting with kafka etl. We've had a few requests to start providing Java examples rather than Python due to type safety. I've also started to add these to metadata-ingestion-examples to make it clearer these are *examples*. They can be used directly or as a basis for other things. As we port to Java we'll move examples to contrib.
		
			
				
	
	
	
		
			1.8 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			1.8 KiB
		
	
	
	
	
	
	
	
datahub Ingestion Tool
Introduction
some tool to ingestion [jdbc-database-schema] and [etl-lineage] metadata.
i split the ingestion procedure to two part: [datahub-producer] and different [metadata-generator]
Roadmap
- datahub-producer load json avro data.
 - add lineage-hive generator
 - add dataset-jdbc generator[include [mysql, mssql, postgresql, oracle] driver]
 - add dataset-hive generator
 - *> add lineage-oracle generator
 - enhance lineage-jdbc generator to lazy iterator mode.
 - enchance avro parser to show error information
 
Quickstart
- install nix and channel
 
  sudo install -d -m755 -o $(id -u) -g $(id -g) /nix
  curl https://nixos.org/nix/install | sh
  
  nix-channel --add https://nixos.org/channels/nixos-20.03 nixpkgs
  nix-channel --update nixpkgs
- [optional] you can download specified dependency in advanced, or it will automatically download at run time.
 
  nix-shell bin/[datahub-producer].hs.nix
  nix-shell bin/[datahub-producer].py.nix
  ...
- load json data to datahub
 
    cat sample/mce.json.dat | bin/datahub-producer.hs config
- parse hive sql to datahub
 
    ls sample/hive_*.sql | bin/lineage_hive_generator.hs | bin/datahub-producer.hs config
- load jdbc schema(mysql, mssql, postgresql, oracle) to datahub
 
    bin/dataset-jdbc-generator.hs | bin/datahub-producer.hs config
- load hive schema to datahub
 
    bin/dataset-hive-generator.py | bin/datahub-producer.hs config
Reference
- 
hive/presto/vertica SQL Parser
uber/queryparser [https://github.com/uber/queryparser.git] - 
oracle procedure syntax
https://docs.oracle.com/cd/E11882_01/server.112/e41085/sqlqr01001.htm#SQLQR110 - 
postgresql procedure parser
SQream/hssqlppp [https://github.com/JakeWheat/hssqlppp.git]