Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?


Copy link to this message
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
Yang -- I think you'll get the representation you're looking for by
applying the FLATTEN a second time.  Each instance of a FLATTEN strips off
a single layer.

Norbert

On Sun, Jun 24, 2012 at 5:57 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate
> K(x1,...,x100) are actually very different.
>
> The latter is a bag, with columns x1, x2..x100. This is generally what is
> desired.
>
> The former is a bag of column x1, then a bag of column x2, then a bag of
> column x3, etc. Each will be unordered and independent.
>
> 2012/6/24 yonghu <[EMAIL PROTECTED]>
>
> > You can also write like
> >
> > K1.(x1,x2,...,x100).
> >
> > regards!
> >
> > Yong
> >
> > On Sun, Jun 24, 2012 at 8:40 PM, Yang <[EMAIL PROTECTED]> wrote:
> > > thanks,
> > >
> > > but this is a bit more cumbersome: if I have
> > >
> > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100);
> > >
> > > I'd have to re-write each xn by adding K.( )
> > >
> > >
> > > it would be nice if the schema of K can strip off the surrounding {(
> )}.
> > > actually it should,
> > > since this is after a FLATTEN()
> > >
> > >
> > > Yang
> > >
> > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <[EMAIL PROTECTED]>
> wrote:
> > >
> > >> So, I think you want to project the x in K. You can write the pig as:
> > >>
> > >> M = foreach K generate K.(x) as X;
> > >>
> > >> Hope this can help you.
> > >>
> > >> Yong
> > >>
> > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]> wrote:
> > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int,
> > >> y:int)}
> > >> >
> > >> > in my pig script:
> > >> >
> > >> > I do
> > >> >
> > >> > K = foreach blah generate UDF( xxx);
> > >> >
> > >> > M = foreach K generate x;
> > >> >
> > >> >
> > >> > here PIG 0.8.1 says x can not be found in schema, since
> > >> >
> > >> > describe K
> > >> >
> > >> > shows:
> > >> > { mytuple:tuple(x:int , y:int) }
> > >> >
> > >> > while 0.10.0
> > >> >
> > >> > shows
> > >> > {x:int, y:int}
> > >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB