Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> "Exploding" a Hive array<string> in Pig from an RCFile


+
Malcolm Tye 2012-04-05, 12:58
Copy link to this message
-
Re: "Exploding" a Hive array<string> in Pig from an RCFile
Malcolm -- typically, you'd use a STRSPLIT and optional FLATTEN to tokenize
a chararray on some delimeter.  So the following should work:

opt = foreach mt generate C_SUB_ID, flatten(STRSPLIT(seg_ids,':')) as
s_seg_id;

Norbert

On Thu, Apr 5, 2012 at 8:58 AM, Malcolm Tye <[EMAIL PROTECTED]>wrote:

> Hi,
>    I'm storing data into a partitioned table using Hive in RCFile format,
> but I want to use Pig to do the aggregation of that data.
>
> In my array <string> in Hive, I have colon delimited data, E.g.
>
> :0:12:21:99:
>
> With the lateral view and explode functions in Hive, I can output each
> value
> as a separate row.
>
> In Pig, I think I need to use flatten, but it just outputs the array as a
> single field, and I can't see where to specify that the delimiter is the
> delimiter/value separator
>
> register /opt/pig/trunk/bin/piggybank.jar
> mt = LOAD '/hrly_sub_smry/year_month_day=20120329/hour=04/*' USING
> org.apache.pig.piggybank.storage.HiveColumnarLoader('C_SUB_ID
> string,seg_ids
> array<string>');
> opt = foreach mt generate C_SUB_ID, flatten(seg_ids) as s_seg_id;
> dump opt;
>
>
>
> Thanks
>
> Malc
>
>
>
+
Malcolm Tye 2012-04-11, 22:59
+
Norbert Burger 2012-04-12, 03:14
+
Aniket Mokashi 2012-04-12, 09:38
+
Malcolm Tye 2012-05-03, 12:29