14 Commits

Author SHA1 Message Date
Shuya Tsukamoto
e0f324c8e4 Implement ParquetFileAnalyzer for hadoop-dataset-extractor-standalone (#483)
* Implement ParquetFileAnalyzer for hadoop-dataset-extractor-standalone

* Move location

* Update build.gradle
2017-05-04 10:08:51 -07:00
Eric Sun
7b36d09b58 Add get_schema_literal_from_url() to fetch schema literal based on schema url (#268)
* use schema_url_helper to fetch avro schema from hdfs or http location

* trim space

* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
2016-11-07 08:14:45 -08:00
Yi Wang
664e4072bb Upgrade to play 2.4.8 2016-10-19 17:42:28 -07:00
Eric Sun
53d40c8392 add a few new hdfs directory patterns 2016-08-03 16:16:58 -07:00
Eric Sun
1cd5872369 temp fix for hdfs_schema_crawler getRuntime().exec() hangs problem; exclude log4j 2016-08-03 15:50:00 -07:00
Eric Sun
1573fdb212 rename hive dependency to hive_exec; reuse metadata-etl/extralibs; test travis ci; 2016-06-28 18:03:02 -07:00
SunZhaonan
4a4894a192 Use Kerberos login 2016-03-17 12:31:58 -07:00
SunZhaonan
a0b7cb9d57 Fix process hanging bug. Add hive field ETL process. 2016-03-16 19:12:21 -07:00
SunZhaonan
aff8f323e4 Scheduler check previous job is finished. Redirect remote outputstream into log. Fix avro parser bugs 2016-03-16 19:09:53 -07:00
SunZhaonan
dfeefba213 Parameterize dataset source derived process. 2016-02-17 16:06:24 -08:00
SunZhaonan
de4d4cd0c1 Add documentation on important Constants and Classes 2016-02-12 16:57:12 -08:00
SunZhaonan
b5d7c38b7d Eclipse integration. Resolve circular dependency. wherehows-common-test configure. 2016-02-09 15:50:49 -08:00
Zhen Chen
5bfb5adb71 close input stream and fix import older version jar in the standalone module 2015-12-03 16:59:38 -08:00
SunZhaonan
d5c3d87d00 Initial commit 2015-11-19 14:39:21 -08:00