Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> "Exploding" a Hive array<string> in Pig from an RCFile

Copy link to this message
"Exploding" a Hive array<string> in Pig from an RCFile
    I'm storing data into a partitioned table using Hive in RCFile format,
but I want to use Pig to do the aggregation of that data.

In my array <string> in Hive, I have colon delimited data, E.g.


With the lateral view and explode functions in Hive, I can output each value
as a separate row.

In Pig, I think I need to use flatten, but it just outputs the array as a
single field, and I can't see where to specify that the delimiter is the
delimiter/value separator

register /opt/pig/trunk/bin/piggybank.jar
mt = LOAD '/hrly_sub_smry/year_month_day=20120329/hour=04/*' USING
org.apache.pig.piggybank.storage.HiveColumnarLoader('C_SUB_ID string,seg_ids
opt = foreach mt generate C_SUB_ID, flatten(seg_ids) as s_seg_id;
dump opt;