Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Flattening nested bags


Copy link to this message
-
Re: Flattening nested bags
B = foreach A generate item, d, flatten(things);
C = foreach B generate item, d, thing, d1, flatten(values);

Sent from my iPhone

On Jun 4, 2013, at 5:46 PM, "David Parks" <[EMAIL PROTECTED]> wrote:

> We've been at our first real use case with pig for quite some time now, and
> still not successful. I wonder if someone can provide an answer to this very
> much simplified version of our problem:
>
> Input data:
> ---------------
> 'item1' 111     { ('thing1', 222, {('value1'),('value2')}) }
>
> Load statement for above data:
> ----------------------------------------
> A = load 'data6' as ( item:chararray, d:int, things:bag{(thing:chararray,
> d1:int, values:bag{(v:chararray)})} );
>
> Desired result:
> ------------------
> ('item1'        111    thing1    222    value1)
> ('item1'        111    thing1    222    value2)
>
> Questions:
> ----------------
> - Is there a single step I can use to flatten this? Or will it require
> doing 2 steps: first flatten 'things', and then take those results and
> flatten 'values'?
> - We're really looking for the syntax to get this right. I've posted a
> number of questions here and on Stack Overflow with lots of good
> suggestions, and read through the O'Reilly book online, none of which,
> though, have gotten me past constant errors like "Cannot find field v in
> values:bag{:tuple(v:chararray)}"
> - Should I be working on converting our data to SQL-like table formats
> rather than this more Object-Oriented format with nested collections?
>
> Psudo-code attempt (I've tried 50+ versions of this in every form I can
> gleen from examples out on the internet with no success):
> ----------------------------------------------------
> B = FOREACH A GENERATE item, d, things.thing as thing, d1,
> FLATTEN(things.values.v) as v;
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB