Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> "Exploding" a Hive array<string> in Pig from an RCFile


Copy link to this message
-
"Exploding" a Hive array<string> in Pig from an RCFile
Hi,
    I'm storing data into a partitioned table using Hive in RCFile format,
but I want to use Pig to do the aggregation of that data.

In my array <string> in Hive, I have colon delimited data, E.g.

:0:12:21:99:

With the lateral view and explode functions in Hive, I can output each value
as a separate row.

In Pig, I think I need to use flatten, but it just outputs the array as a
single field, and I can't see where to specify that the delimiter is the
delimiter/value separator

register /opt/pig/trunk/bin/piggybank.jar
mt = LOAD '/hrly_sub_smry/year_month_day=20120329/hour=04/*' USING
org.apache.pig.piggybank.storage.HiveColumnarLoader('C_SUB_ID string,seg_ids
array<string>');
opt = foreach mt generate C_SUB_ID, flatten(seg_ids) as s_seg_id;
dump opt;

Thanks

Malc
+
Norbert Burger 2012-04-06, 15:00
+
Malcolm Tye 2012-04-11, 22:59
+
Norbert Burger 2012-04-12, 03:14
+
Aniket Mokashi 2012-04-12, 09:38
+
Malcolm Tye 2012-05-03, 12:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB