Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Dynamic Partitioning not working in HCatalog 0.11?


Copy link to this message
-
Dynamic Partitioning not working in HCatalog 0.11?
Here's some simple Pig that reads from one Hive table and writes to another
(same data, same schema):

sigs_in = load 'signals' using org.apache.hcatalog.pig.HCatLoader();
sigs = filter sigs_in by datetime_partition == '2013-10-07_0000';
STORE sigs INTO 'signals_orc' USING org.apache.hcatalog.pig.HCatStorer();

the signals_orc table is defined to have the datetime_partition partition,
as in:

create external table signals_orc (
 signal_id string,
 ...
) partitioned by (datetime_partition string)
stored as ORC
location '/user/hive/external/signals_orc'
tblproperties ("orc.compress"="snappy");

After running the job, I end up with the following directory in HDFS:

/user/hive/external/signals_orc/datetime_partition=__HIVE_DEFAULT_PARTITION__

When clearly the filter in the Pig proves the datetime_partition field in
my data is valid. If I change the store clause in my Pig script to:

STORE sigs INTO 'signals_orc' USING
org.apache.hcatalog.pig.HCatStorer('datetime_partition=2013-10-07_0000');

then I get the correct output:

/user/hive/external/signals_orc/datetime_partition=2013-10-07_0000

So it appears to me that there's something broken in the dynamic
partitioning code in HCatalog 0.11.0.

Thanks.
Tim
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB