Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Flattening nested bags


+
David Parks 2013-06-05, 00:46
Copy link to this message
-
Re: Flattening nested bags
B = foreach A generate item, d, flatten(things);
C = foreach B generate item, d, thing, d1, flatten(values);

Sent from my iPhone

On Jun 4, 2013, at 5:46 PM, "David Parks" <[EMAIL PROTECTED]> wrote:

> We've been at our first real use case with pig for quite some time now, and
> still not successful. I wonder if someone can provide an answer to this very
> much simplified version of our problem:
>
> Input data:
> ---------------
> 'item1' 111     { ('thing1', 222, {('value1'),('value2')}) }
>
> Load statement for above data:
> ----------------------------------------
> A = load 'data6' as ( item:chararray, d:int, things:bag{(thing:chararray,
> d1:int, values:bag{(v:chararray)})} );
>
> Desired result:
> ------------------
> ('item1'        111    thing1    222    value1)
> ('item1'        111    thing1    222    value2)
>
> Questions:
> ----------------
> - Is there a single step I can use to flatten this? Or will it require
> doing 2 steps: first flatten 'things', and then take those results and
> flatten 'values'?
> - We're really looking for the syntax to get this right. I've posted a
> number of questions here and on Stack Overflow with lots of good
> suggestions, and read through the O'Reilly book online, none of which,
> though, have gotten me past constant errors like "Cannot find field v in
> values:bag{:tuple(v:chararray)}"
> - Should I be working on converting our data to SQL-like table formats
> rather than this more Object-Oriented format with nested collections?
>
> Psudo-code attempt (I've tried 50+ versions of this in every form I can
> gleen from examples out on the internet with no success):
> ----------------------------------------------------
> B = FOREACH A GENERATE item, d, things.thing as thing, d1,
> FLATTEN(things.values.v) as v;
>
>
>
+
David Parks 2013-06-05, 23:57
+
Pradeep Gollakota 2013-06-06, 05:19