Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - question about AVG


Copy link to this message
-
Re: question about AVG
Haitao Yao 2012-02-15, 08:59
I solve this problem by extending the build in AVG function to accept char array bag as input and calculate the result.

why the build-in AVG can not accept the char array bag and convert the value to double and calculate the result?

在 2012-2-15,下午4:04, Jonathan Coveney 写道:

> the issue is that doing (int)b.x does not cast each column to an int, but
> rather, it tries to cast the bag itself. Short of flattening out the bag
> and projecting it as an int, which is inefficient, I suppose you could make
> a UDF that calculate the Average of chararrays by casting to an int...but
> then that raises the question of why you couldn't just load it as an x:int
> in the first place.
>
> So generally, you need to do something like "foreach rel generate (int)x".
> In this case that doesn't work as efficiently, but this is kind of a weird
> case.
>
> 2012/2/14 Haitao Yao <[EMAIL PROTECTED]>
>
>> hi, all
>>       here's my pig script:
>>
>> A = load 'input' as (b:bag{t:(x:int, y:int)});
>> B = foreach A generate AVG(b.x);
>> describe B;
>>
>> it works well.
>> if the b.x is char array, the problems arise:
>> A = load 'input' as (b:bag{t:(x:chararray, y:int)});
>> B = foreach A generate AVG((int)b.x);
>> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1052:
>> <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)}
>> to int
>> Details at logfile: /tmp/pig_1329286634873.log
>>
>> Why?  How can I calculate the avg of b.x if b.x must be a chararray?
>>
>>
>> here's the running snapshot in Grunt:
>>
>> grunt> A = load 'input' as (b:bag{t:(x:int, y:int)});
>> grunt> B = foreach A generate AVG(b.x);
>> grunt> describe B;
>> B: {double}
>> grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)});
>> grunt> B = foreach A generate AVG((int)b.x);
>> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1052:
>> <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)}
>> to int
>> Details at logfile: /tmp/pig_1329286634873.log
>> grunt>
>>
>> thanks.
>>
>>