Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?


Copy link to this message
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
actually FLATTEN(FLATTEN(....)) is not syntactically correct , at least in
0.8. also semantically it's not what I wanted either, cuz FLATTEN works on
bags, while I wanted to project ALL fields of a tuple.

I ended up adding a T:tuple(  ) to the AS clause, and adding an explicit
projection after the udf call.

Thanks
Yang

On Mon, Jun 25, 2012 at 6:45 AM, Yang <[EMAIL PROTECTED]> wrote:

> thanks Robert, I'll try it
> On Jun 25, 2012 3:56 AM, "Norbert Burger" <[EMAIL PROTECTED]>
> wrote:
>
>> Yang -- I think you'll get the representation you're looking for by
>> applying the FLATTEN a second time.  Each instance of a FLATTEN strips off
>> a single layer.
>>
>> Norbert
>>
>> On Sun, Jun 24, 2012 at 5:57 PM, Jonathan Coveney <[EMAIL PROTECTED]
>> >wrote:
>>
>> > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate
>> > K(x1,...,x100) are actually very different.
>> >
>> > The latter is a bag, with columns x1, x2..x100. This is generally what
>> is
>> > desired.
>> >
>> > The former is a bag of column x1, then a bag of column x2, then a bag of
>> > column x3, etc. Each will be unordered and independent.
>> >
>> > 2012/6/24 yonghu <[EMAIL PROTECTED]>
>> >
>> > > You can also write like
>> > >
>> > > K1.(x1,x2,...,x100).
>> > >
>> > > regards!
>> > >
>> > > Yong
>> > >
>> > > On Sun, Jun 24, 2012 at 8:40 PM, Yang <[EMAIL PROTECTED]> wrote:
>> > > > thanks,
>> > > >
>> > > > but this is a bit more cumbersome: if I have
>> > > >
>> > > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100);
>> > > >
>> > > > I'd have to re-write each xn by adding K.( )
>> > > >
>> > > >
>> > > > it would be nice if the schema of K can strip off the surrounding {(
>> > )}.
>> > > > actually it should,
>> > > > since this is after a FLATTEN()
>> > > >
>> > > >
>> > > > Yang
>> > > >
>> > > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <[EMAIL PROTECTED]>
>> > wrote:
>> > > >
>> > > >> So, I think you want to project the x in K. You can write the pig
>> as:
>> > > >>
>> > > >> M = foreach K generate K.(x) as X;
>> > > >>
>> > > >> Hope this can help you.
>> > > >>
>> > > >> Yong
>> > > >>
>> > > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]>
>> wrote:
>> > > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x:
>> int,
>> > > >> y:int)}
>> > > >> >
>> > > >> > in my pig script:
>> > > >> >
>> > > >> > I do
>> > > >> >
>> > > >> > K = foreach blah generate UDF( xxx);
>> > > >> >
>> > > >> > M = foreach K generate x;
>> > > >> >
>> > > >> >
>> > > >> > here PIG 0.8.1 says x can not be found in schema, since
>> > > >> >
>> > > >> > describe K
>> > > >> >
>> > > >> > shows:
>> > > >> > { mytuple:tuple(x:int , y:int) }
>> > > >> >
>> > > >> > while 0.10.0
>> > > >> >
>> > > >> > shows
>> > > >> > {x:int, y:int}
>> > > >>
>> > >
>> >
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB