Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Dynamic Partitioning not working in HCatalog 0.11?


Copy link to this message
-
Dynamic Partitioning not working in HCatalog 0.11?
Timothy Potter 2013-10-10, 20:43
Here's some simple Pig that reads from one Hive table and writes to another
(same data, same schema):

sigs_in = load 'signals' using org.apache.hcatalog.pig.HCatLoader();
sigs = filter sigs_in by datetime_partition == '2013-10-07_0000';
STORE sigs INTO 'signals_orc' USING org.apache.hcatalog.pig.HCatStorer();

the signals_orc table is defined to have the datetime_partition partition,
as in:

create external table signals_orc (
 signal_id string,
 ...
) partitioned by (datetime_partition string)
stored as ORC
location '/user/hive/external/signals_orc'
tblproperties ("orc.compress"="snappy");

After running the job, I end up with the following directory in HDFS:

/user/hive/external/signals_orc/datetime_partition=__HIVE_DEFAULT_PARTITION__

When clearly the filter in the Pig proves the datetime_partition field in
my data is valid. If I change the store clause in my Pig script to:

STORE sigs INTO 'signals_orc' USING
org.apache.hcatalog.pig.HCatStorer('datetime_partition=2013-10-07_0000');

then I get the correct output:

/user/hive/external/signals_orc/datetime_partition=2013-10-07_0000

So it appears to me that there's something broken in the dynamic
partitioning code in HCatalog 0.11.0.

Thanks.
Tim