Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> question about AVG


Copy link to this message
-
Re: question about AVG
I solve this problem by extending the build in AVG function to accept char array bag as input and calculate the result.

why the build-in AVG can not accept the char array bag and convert the value to double and calculate the result?

在 2012-2-15,下午4:04, Jonathan Coveney 写道:

> the issue is that doing (int)b.x does not cast each column to an int, but
> rather, it tries to cast the bag itself. Short of flattening out the bag
> and projecting it as an int, which is inefficient, I suppose you could make
> a UDF that calculate the Average of chararrays by casting to an int...but
> then that raises the question of why you couldn't just load it as an x:int
> in the first place.
>
> So generally, you need to do something like "foreach rel generate (int)x".
> In this case that doesn't work as efficiently, but this is kind of a weird
> case.
>
> 2012/2/14 Haitao Yao <[EMAIL PROTECTED]>
>
>> hi, all
>>       here's my pig script:
>>
>> A = load 'input' as (b:bag{t:(x:int, y:int)});
>> B = foreach A generate AVG(b.x);
>> describe B;
>>
>> it works well.
>> if the b.x is char array, the problems arise:
>> A = load 'input' as (b:bag{t:(x:chararray, y:int)});
>> B = foreach A generate AVG((int)b.x);
>> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1052:
>> <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)}
>> to int
>> Details at logfile: /tmp/pig_1329286634873.log
>>
>> Why?  How can I calculate the avg of b.x if b.x must be a chararray?
>>
>>
>> here's the running snapshot in Grunt:
>>
>> grunt> A = load 'input' as (b:bag{t:(x:int, y:int)});
>> grunt> B = foreach A generate AVG(b.x);
>> grunt> describe B;
>> B: {double}
>> grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)});
>> grunt> B = foreach A generate AVG((int)b.x);
>> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1052:
>> <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)}
>> to int
>> Details at logfile: /tmp/pig_1329286634873.log
>> grunt>
>>
>> thanks.
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB