|
|
Haitao Yao 2012-02-15, 06:19
hi, all here's my pig script:
A = load 'input' as (b:bag{t:(x:int, y:int)}); B = foreach A generate AVG(b.x); describe B;
it works well. if the b.x is char array, the problems arise: A = load 'input' as (b:bag{t:(x:chararray, y:int)}); B = foreach A generate AVG((int)b.x); 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1052: <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)} to int Details at logfile: /tmp/pig_1329286634873.log
Why? How can I calculate the avg of b.x if b.x must be a chararray? here's the running snapshot in Grunt:
grunt> A = load 'input' as (b:bag{t:(x:int, y:int)}); grunt> B = foreach A generate AVG(b.x); grunt> describe B; B: {double} grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); grunt> B = foreach A generate AVG((int)b.x); 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1052: <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)} to int Details at logfile: /tmp/pig_1329286634873.log grunt>
thanks.
Jonathan Coveney 2012-02-15, 08:04
the issue is that doing (int)b.x does not cast each column to an int, but rather, it tries to cast the bag itself. Short of flattening out the bag and projecting it as an int, which is inefficient, I suppose you could make a UDF that calculate the Average of chararrays by casting to an int...but then that raises the question of why you couldn't just load it as an x:int in the first place.
So generally, you need to do something like "foreach rel generate (int)x". In this case that doesn't work as efficiently, but this is kind of a weird case.
2012/2/14 Haitao Yao <[EMAIL PROTECTED]>
> hi, all > here's my pig script: > > A = load 'input' as (b:bag{t:(x:int, y:int)}); > B = foreach A generate AVG(b.x); > describe B; > > it works well. > if the b.x is char array, the problems arise: > A = load 'input' as (b:bag{t:(x:chararray, y:int)}); > B = foreach A generate AVG((int)b.x); > 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1052: > <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)} > to int > Details at logfile: /tmp/pig_1329286634873.log > > Why? How can I calculate the avg of b.x if b.x must be a chararray? > > > here's the running snapshot in Grunt: > > grunt> A = load 'input' as (b:bag{t:(x:int, y:int)}); > grunt> B = foreach A generate AVG(b.x); > grunt> describe B; > B: {double} > grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); > grunt> B = foreach A generate AVG((int)b.x); > 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1052: > <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)} > to int > Details at logfile: /tmp/pig_1329286634873.log > grunt> > > thanks. > >
Haitao Yao 2012-02-15, 08:59
I solve this problem by extending the build in AVG function to accept char array bag as input and calculate the result.
why the build-in AVG can not accept the char array bag and convert the value to double and calculate the result?
在 2012-2-15,下午4:04, Jonathan Coveney 写道:
> the issue is that doing (int)b.x does not cast each column to an int, but > rather, it tries to cast the bag itself. Short of flattening out the bag > and projecting it as an int, which is inefficient, I suppose you could make > a UDF that calculate the Average of chararrays by casting to an int...but > then that raises the question of why you couldn't just load it as an x:int > in the first place. > > So generally, you need to do something like "foreach rel generate (int)x". > In this case that doesn't work as efficiently, but this is kind of a weird > case. > > 2012/2/14 Haitao Yao <[EMAIL PROTECTED]> > >> hi, all >> here's my pig script: >> >> A = load 'input' as (b:bag{t:(x:int, y:int)}); >> B = foreach A generate AVG(b.x); >> describe B; >> >> it works well. >> if the b.x is char array, the problems arise: >> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); >> B = foreach A generate AVG((int)b.x); >> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1052: >> <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)} >> to int >> Details at logfile: /tmp/pig_1329286634873.log >> >> Why? How can I calculate the avg of b.x if b.x must be a chararray? >> >> >> here's the running snapshot in Grunt: >> >> grunt> A = load 'input' as (b:bag{t:(x:int, y:int)}); >> grunt> B = foreach A generate AVG(b.x); >> grunt> describe B; >> B: {double} >> grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); >> grunt> B = foreach A generate AVG((int)b.x); >> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1052: >> <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)} >> to int >> Details at logfile: /tmp/pig_1329286634873.log >> grunt> >> >> thanks. >> >>
Prashant Kommireddi 2012-02-15, 09:21
AVG over chararrays is not a usual case, simply because it does not make sense in most cases. For eg, what would be the average if it were a bag of first or last names? AVG would fail if it tried to convert String to Integer or Double.
In your case its the best to declare it int/long if you know the data type beforehand.
Thanks, Prashant
2012/2/15 Haitao Yao <[EMAIL PROTECTED]>
> I solve this problem by extending the build in AVG function to accept char > array bag as input and calculate the result. > > why the build-in AVG can not accept the char array bag and convert the > value to double and calculate the result? > > > > 在 2012-2-15,下午4:04, Jonathan Coveney 写� 溃�> > > the issue is that doing (int)b.x does not cast each column to an int, but > > rather, it tries to cast the bag itself. Short of flattening out the bag > > and projecting it as an int, which is inefficient, I suppose you could > make > > a UDF that calculate the Average of chararrays by casting to an int...but > > then that raises the question of why you couldn't just load it as an > x:int > > in the first place. > > > > So generally, you need to do something like "foreach rel generate > (int)x". > > In this case that doesn't work as efficiently, but this is kind of a > weird > > case. > > > > 2012/2/14 Haitao Yao <[EMAIL PROTECTED]> > > > >> hi, all > >> here's my pig script: > >> > >> A = load 'input' as (b:bag{t:(x:int, y:int)}); > >> B = foreach A generate AVG(b.x); > >> describe B; > >> > >> it works well. > >> if the b.x is char array, the problems arise: > >> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); > >> B = foreach A generate AVG((int)b.x); > >> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt - > >> ERROR 1052: > >> <line 4, column 28> Cannot cast bag with schema > :bag{:tuple(x:chararray)} > >> to int > >> Details at logfile: /tmp/pig_1329286634873.log > >> > >> Why? How can I calculate the avg of b.x if b.x must be a chararray? > >> > >> > >> here's the running snapshot in Grunt: > >> > >> grunt> A = load 'input' as (b:bag{t:(x:int, y:int)}); > >> grunt> B = foreach A generate AVG(b.x); > >> grunt> describe B; > >> B: {double} > >> grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); > >> grunt> B = foreach A generate AVG((int)b.x); > >> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt - > >> ERROR 1052: > >> <line 4, column 28> Cannot cast bag with schema > :bag{:tuple(x:chararray)} > >> to int > >> Details at logfile: /tmp/pig_1329286634873.log > >> grunt> > >> > >> thanks. > >> > >> > >
Jonathan Coveney 2012-02-15, 16:42
I agree with Prashant. I am hard pressed to find a case where it would be useful, and I would much rather it fail on parse than while running.
2012/2/15 Prashant Kommireddi <[EMAIL PROTECTED]>
> AVG over chararrays is not a usual case, simply because it does not make > sense in most cases. For eg, what would be the average if it were a bag of > first or last names? AVG would fail if it tried to convert String to > Integer or Double. > > In your case its the best to declare it int/long if you know the data type > beforehand. > > Thanks, > Prashant > > 2012/2/15 Haitao Yao <[EMAIL PROTECTED]> > > > I solve this problem by extending the build in AVG function to accept > char > > array bag as input and calculate the result. > > > > why the build-in AVG can not accept the char array bag and convert the > > value to double and calculate the result? > > > > > > > > 在 2012-2-15,下午4:04, Jonathan Coveney 写道: > > > > > the issue is that doing (int)b.x does not cast each column to an int, > but > > > rather, it tries to cast the bag itself. Short of flattening out the > bag > > > and projecting it as an int, which is inefficient, I suppose you could > > make > > > a UDF that calculate the Average of chararrays by casting to an > int...but > > > then that raises the question of why you couldn't just load it as an > > x:int > > > in the first place. > > > > > > So generally, you need to do something like "foreach rel generate > > (int)x". > > > In this case that doesn't work as efficiently, but this is kind of a > > weird > > > case. > > > > > > 2012/2/14 Haitao Yao <[EMAIL PROTECTED]> > > > > > >> hi, all > > >> here's my pig script: > > >> > > >> A = load 'input' as (b:bag{t:(x:int, y:int)}); > > >> B = foreach A generate AVG(b.x); > > >> describe B; > > >> > > >> it works well. > > >> if the b.x is char array, the problems arise: > > >> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); > > >> B = foreach A generate AVG((int)b.x); > > >> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt > - > > >> ERROR 1052: > > >> <line 4, column 28> Cannot cast bag with schema > > :bag{:tuple(x:chararray)} > > >> to int > > >> Details at logfile: /tmp/pig_1329286634873.log > > >> > > >> Why? How can I calculate the avg of b.x if b.x must be a chararray? > > >> > > >> > > >> here's the running snapshot in Grunt: > > >> > > >> grunt> A = load 'input' as (b:bag{t:(x:int, y:int)}); > > >> grunt> B = foreach A generate AVG(b.x); > > >> grunt> describe B; > > >> B: {double} > > >> grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)}); > > >> grunt> B = foreach A generate AVG((int)b.x); > > >> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt > - > > >> ERROR 1052: > > >> <line 4, column 28> Cannot cast bag with schema > > :bag{:tuple(x:chararray)} > > >> to int > > >> Details at logfile: /tmp/pig_1329286634873.log > > >> grunt> > > >> > > >> thanks. > > >> > > >> > > > > >
|
|