|
|
-
FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
Yang 2012-06-24, 10:40
my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, y:int)}
in my pig script:
I do
K = foreach blah generate UDF( xxx);
M = foreach K generate x; here PIG 0.8.1 says x can not be found in schema, since
describe K
shows: { mytuple:tuple(x:int , y:int) }
while 0.10.0
shows {x:int, y:int}
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
yonghu 2012-06-24, 18:17
So, I think you want to project the x in K. You can write the pig as:
M = foreach K generate K.(x) as X;
Hope this can help you.
Yong
On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]> wrote: > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, y:int)} > > in my pig script: > > I do > > K = foreach blah generate UDF( xxx); > > M = foreach K generate x; > > > here PIG 0.8.1 says x can not be found in schema, since > > describe K > > shows: > { mytuple:tuple(x:int , y:int) } > > while 0.10.0 > > shows > {x:int, y:int}
+
yonghu 2012-06-24, 18:17
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
Yang 2012-06-24, 18:40
thanks,
but this is a bit more cumbersome: if I have
generate K.(x1), K.(x2), K.(x3) .... , K.(x100);
I'd have to re-write each xn by adding K.( ) it would be nice if the schema of K can strip off the surrounding {( )}. actually it should, since this is after a FLATTEN() Yang
On Sun, Jun 24, 2012 at 11:17 AM, yonghu <[EMAIL PROTECTED]> wrote:
> So, I think you want to project the x in K. You can write the pig as: > > M = foreach K generate K.(x) as X; > > Hope this can help you. > > Yong > > On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]> wrote: > > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, > y:int)} > > > > in my pig script: > > > > I do > > > > K = foreach blah generate UDF( xxx); > > > > M = foreach K generate x; > > > > > > here PIG 0.8.1 says x can not be found in schema, since > > > > describe K > > > > shows: > > { mytuple:tuple(x:int , y:int) } > > > > while 0.10.0 > > > > shows > > {x:int, y:int} >
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
yonghu 2012-06-24, 20:33
You can also write like
K1.(x1,x2,...,x100).
regards!
Yong
On Sun, Jun 24, 2012 at 8:40 PM, Yang <[EMAIL PROTECTED]> wrote: > thanks, > > but this is a bit more cumbersome: if I have > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); > > I'd have to re-write each xn by adding K.( ) > > > it would be nice if the schema of K can strip off the surrounding {( )}. > actually it should, > since this is after a FLATTEN() > > > Yang > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <[EMAIL PROTECTED]> wrote: > >> So, I think you want to project the x in K. You can write the pig as: >> >> M = foreach K generate K.(x) as X; >> >> Hope this can help you. >> >> Yong >> >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]> wrote: >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, >> y:int)} >> > >> > in my pig script: >> > >> > I do >> > >> > K = foreach blah generate UDF( xxx); >> > >> > M = foreach K generate x; >> > >> > >> > here PIG 0.8.1 says x can not be found in schema, since >> > >> > describe K >> > >> > shows: >> > { mytuple:tuple(x:int , y:int) } >> > >> > while 0.10.0 >> > >> > shows >> > {x:int, y:int} >>
+
yonghu 2012-06-24, 20:33
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
Jonathan Coveney 2012-06-24, 21:57
generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate K(x1,...,x100) are actually very different.
The latter is a bag, with columns x1, x2..x100. This is generally what is desired.
The former is a bag of column x1, then a bag of column x2, then a bag of column x3, etc. Each will be unordered and independent.
2012/6/24 yonghu <[EMAIL PROTECTED]>
> You can also write like > > K1.(x1,x2,...,x100). > > regards! > > Yong > > On Sun, Jun 24, 2012 at 8:40 PM, Yang <[EMAIL PROTECTED]> wrote: > > thanks, > > > > but this is a bit more cumbersome: if I have > > > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); > > > > I'd have to re-write each xn by adding K.( ) > > > > > > it would be nice if the schema of K can strip off the surrounding {( )}. > > actually it should, > > since this is after a FLATTEN() > > > > > > Yang > > > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <[EMAIL PROTECTED]> wrote: > > > >> So, I think you want to project the x in K. You can write the pig as: > >> > >> M = foreach K generate K.(x) as X; > >> > >> Hope this can help you. > >> > >> Yong > >> > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]> wrote: > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, > >> y:int)} > >> > > >> > in my pig script: > >> > > >> > I do > >> > > >> > K = foreach blah generate UDF( xxx); > >> > > >> > M = foreach K generate x; > >> > > >> > > >> > here PIG 0.8.1 says x can not be found in schema, since > >> > > >> > describe K > >> > > >> > shows: > >> > { mytuple:tuple(x:int , y:int) } > >> > > >> > while 0.10.0 > >> > > >> > shows > >> > {x:int, y:int} > >> >
+
Jonathan Coveney 2012-06-24, 21:57
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
Norbert Burger 2012-06-25, 10:55
Yang -- I think you'll get the representation you're looking for by applying the FLATTEN a second time. Each instance of a FLATTEN strips off a single layer.
Norbert
On Sun, Jun 24, 2012 at 5:57 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
> generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate > K(x1,...,x100) are actually very different. > > The latter is a bag, with columns x1, x2..x100. This is generally what is > desired. > > The former is a bag of column x1, then a bag of column x2, then a bag of > column x3, etc. Each will be unordered and independent. > > 2012/6/24 yonghu <[EMAIL PROTECTED]> > > > You can also write like > > > > K1.(x1,x2,...,x100). > > > > regards! > > > > Yong > > > > On Sun, Jun 24, 2012 at 8:40 PM, Yang <[EMAIL PROTECTED]> wrote: > > > thanks, > > > > > > but this is a bit more cumbersome: if I have > > > > > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); > > > > > > I'd have to re-write each xn by adding K.( ) > > > > > > > > > it would be nice if the schema of K can strip off the surrounding {( > )}. > > > actually it should, > > > since this is after a FLATTEN() > > > > > > > > > Yang > > > > > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <[EMAIL PROTECTED]> > wrote: > > > > > >> So, I think you want to project the x in K. You can write the pig as: > > >> > > >> M = foreach K generate K.(x) as X; > > >> > > >> Hope this can help you. > > >> > > >> Yong > > >> > > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]> wrote: > > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, > > >> y:int)} > > >> > > > >> > in my pig script: > > >> > > > >> > I do > > >> > > > >> > K = foreach blah generate UDF( xxx); > > >> > > > >> > M = foreach K generate x; > > >> > > > >> > > > >> > here PIG 0.8.1 says x can not be found in schema, since > > >> > > > >> > describe K > > >> > > > >> > shows: > > >> > { mytuple:tuple(x:int , y:int) } > > >> > > > >> > while 0.10.0 > > >> > > > >> > shows > > >> > {x:int, y:int} > > >> > > >
+
Norbert Burger 2012-06-25, 10:55
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
Yang 2012-06-25, 13:45
thanks Robert, I'll try it On Jun 25, 2012 3:56 AM, "Norbert Burger" <[EMAIL PROTECTED]> wrote:
> Yang -- I think you'll get the representation you're looking for by > applying the FLATTEN a second time. Each instance of a FLATTEN strips off > a single layer. > > Norbert > > On Sun, Jun 24, 2012 at 5:57 PM, Jonathan Coveney <[EMAIL PROTECTED] > >wrote: > > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate > > K(x1,...,x100) are actually very different. > > > > The latter is a bag, with columns x1, x2..x100. This is generally what is > > desired. > > > > The former is a bag of column x1, then a bag of column x2, then a bag of > > column x3, etc. Each will be unordered and independent. > > > > 2012/6/24 yonghu <[EMAIL PROTECTED]> > > > > > You can also write like > > > > > > K1.(x1,x2,...,x100). > > > > > > regards! > > > > > > Yong > > > > > > On Sun, Jun 24, 2012 at 8:40 PM, Yang <[EMAIL PROTECTED]> wrote: > > > > thanks, > > > > > > > > but this is a bit more cumbersome: if I have > > > > > > > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); > > > > > > > > I'd have to re-write each xn by adding K.( ) > > > > > > > > > > > > it would be nice if the schema of K can strip off the surrounding {( > > )}. > > > > actually it should, > > > > since this is after a FLATTEN() > > > > > > > > > > > > Yang > > > > > > > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <[EMAIL PROTECTED]> > > wrote: > > > > > > > >> So, I think you want to project the x in K. You can write the pig > as: > > > >> > > > >> M = foreach K generate K.(x) as X; > > > >> > > > >> Hope this can help you. > > > >> > > > >> Yong > > > >> > > > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]> > wrote: > > > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: > int, > > > >> y:int)} > > > >> > > > > >> > in my pig script: > > > >> > > > > >> > I do > > > >> > > > > >> > K = foreach blah generate UDF( xxx); > > > >> > > > > >> > M = foreach K generate x; > > > >> > > > > >> > > > > >> > here PIG 0.8.1 says x can not be found in schema, since > > > >> > > > > >> > describe K > > > >> > > > > >> > shows: > > > >> > { mytuple:tuple(x:int , y:int) } > > > >> > > > > >> > while 0.10.0 > > > >> > > > > >> > shows > > > >> > {x:int, y:int} > > > >> > > > > > >
-
Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?
Yang 2012-07-17, 19:51
actually FLATTEN(FLATTEN(....)) is not syntactically correct , at least in 0.8. also semantically it's not what I wanted either, cuz FLATTEN works on bags, while I wanted to project ALL fields of a tuple.
I ended up adding a T:tuple( ) to the AS clause, and adding an explicit projection after the udf call.
Thanks Yang
On Mon, Jun 25, 2012 at 6:45 AM, Yang <[EMAIL PROTECTED]> wrote:
> thanks Robert, I'll try it > On Jun 25, 2012 3:56 AM, "Norbert Burger" <[EMAIL PROTECTED]> > wrote: > >> Yang -- I think you'll get the representation you're looking for by >> applying the FLATTEN a second time. Each instance of a FLATTEN strips off >> a single layer. >> >> Norbert >> >> On Sun, Jun 24, 2012 at 5:57 PM, Jonathan Coveney <[EMAIL PROTECTED] >> >wrote: >> >> > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate >> > K(x1,...,x100) are actually very different. >> > >> > The latter is a bag, with columns x1, x2..x100. This is generally what >> is >> > desired. >> > >> > The former is a bag of column x1, then a bag of column x2, then a bag of >> > column x3, etc. Each will be unordered and independent. >> > >> > 2012/6/24 yonghu <[EMAIL PROTECTED]> >> > >> > > You can also write like >> > > >> > > K1.(x1,x2,...,x100). >> > > >> > > regards! >> > > >> > > Yong >> > > >> > > On Sun, Jun 24, 2012 at 8:40 PM, Yang <[EMAIL PROTECTED]> wrote: >> > > > thanks, >> > > > >> > > > but this is a bit more cumbersome: if I have >> > > > >> > > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); >> > > > >> > > > I'd have to re-write each xn by adding K.( ) >> > > > >> > > > >> > > > it would be nice if the schema of K can strip off the surrounding {( >> > )}. >> > > > actually it should, >> > > > since this is after a FLATTEN() >> > > > >> > > > >> > > > Yang >> > > > >> > > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <[EMAIL PROTECTED]> >> > wrote: >> > > > >> > > >> So, I think you want to project the x in K. You can write the pig >> as: >> > > >> >> > > >> M = foreach K generate K.(x) as X; >> > > >> >> > > >> Hope this can help you. >> > > >> >> > > >> Yong >> > > >> >> > > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <[EMAIL PROTECTED]> >> wrote: >> > > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: >> int, >> > > >> y:int)} >> > > >> > >> > > >> > in my pig script: >> > > >> > >> > > >> > I do >> > > >> > >> > > >> > K = foreach blah generate UDF( xxx); >> > > >> > >> > > >> > M = foreach K generate x; >> > > >> > >> > > >> > >> > > >> > here PIG 0.8.1 says x can not be found in schema, since >> > > >> > >> > > >> > describe K >> > > >> > >> > > >> > shows: >> > > >> > { mytuple:tuple(x:int , y:int) } >> > > >> > >> > > >> > while 0.10.0 >> > > >> > >> > > >> > shows >> > > >> > {x:int, y:int} >> > > >> >> > > >> > >> >
|
|