Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> "Exploding" a Hive array<string> in Pig from an RCFile


+
Malcolm Tye 2012-04-05, 12:58
Copy link to this message
-
Re: "Exploding" a Hive array<string> in Pig from an RCFile
Malcolm -- typically, you'd use a STRSPLIT and optional FLATTEN to tokenize
a chararray on some delimeter.  So the following should work:

opt = foreach mt generate C_SUB_ID, flatten(STRSPLIT(seg_ids,':')) as
s_seg_id;

Norbert

On Thu, Apr 5, 2012 at 8:58 AM, Malcolm Tye <[EMAIL PROTECTED]>wrote:

> Hi,
>    I'm storing data into a partitioned table using Hive in RCFile format,
> but I want to use Pig to do the aggregation of that data.
>
> In my array <string> in Hive, I have colon delimited data, E.g.
>
> :0:12:21:99:
>
> With the lateral view and explode functions in Hive, I can output each
> value
> as a separate row.
>
> In Pig, I think I need to use flatten, but it just outputs the array as a
> single field, and I can't see where to specify that the delimiter is the
> delimiter/value separator
>
> register /opt/pig/trunk/bin/piggybank.jar
> mt = LOAD '/hrly_sub_smry/year_month_day=20120329/hour=04/*' USING
> org.apache.pig.piggybank.storage.HiveColumnarLoader('C_SUB_ID
> string,seg_ids
> array<string>');
> opt = foreach mt generate C_SUB_ID, flatten(seg_ids) as s_seg_id;
> dump opt;
>
>
>
> Thanks
>
> Malc
>
>
>
+
Malcolm Tye 2012-04-11, 22:59
+
Norbert Burger 2012-04-12, 03:14
+
Aniket Mokashi 2012-04-12, 09:38
+
Malcolm Tye 2012-05-03, 12:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB