9 Commits

Author SHA1 Message Date
Shuya Tsukamoto
e0f324c8e4 Implement ParquetFileAnalyzer for hadoop-dataset-extractor-standalone (#483)
* Implement ParquetFileAnalyzer for hadoop-dataset-extractor-standalone

* Move location

* Update build.gradle
2017-05-04 10:08:51 -07:00
Eric Sun
7b36d09b58 Add get_schema_literal_from_url() to fetch schema literal based on schema url (#268)
* use schema_url_helper to fetch avro schema from hdfs or http location

* trim space

* add dfs.namenode.kerberos.principal.pattern; include htrace for SchemaUrlHelper
2016-11-07 08:14:45 -08:00
Eric Sun
53d40c8392 add a few new hdfs directory patterns 2016-08-03 16:16:58 -07:00
SunZhaonan
4a4894a192 Use Kerberos login 2016-03-17 12:31:58 -07:00
SunZhaonan
a0b7cb9d57 Fix process hanging bug. Add hive field ETL process. 2016-03-16 19:12:21 -07:00
SunZhaonan
aff8f323e4 Scheduler check previous job is finished. Redirect remote outputstream into log. Fix avro parser bugs 2016-03-16 19:09:53 -07:00
SunZhaonan
dfeefba213 Parameterize dataset source derived process. 2016-02-17 16:06:24 -08:00
SunZhaonan
de4d4cd0c1 Add documentation on important Constants and Classes 2016-02-12 16:57:12 -08:00
SunZhaonan
d5c3d87d00 Initial commit 2015-11-19 14:39:21 -08:00