Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Help to solve UDAF errors!


Copy link to this message
-
Re: Help to solve UDAF errors!
Hi Mark,

Thanks for the response!
The UDAFPercentile.java have two terminate() methods since it is handling
two different input types by the two inner classes: PercentileLongEvaluator
and PercentileLongArrayEvaluator.
I am handling only a single input type of double from one table column to
the iterate() method and wish to return an ArrayList<DoubleWritable> from
the terminate() method.
What is wrong in my class?
Moreover, is there any way for UDF/UDAF/UDTF which can process all the rows
of the table and output only a subset of the total rows based on some
aggregation function of one column attribute i.e., similar to my case of
computing the top-n-percent of a column attribute and output the entire set
of filtered rows with all other columns from the table?

Thanks,
Abhishek

On Sun, Feb 10, 2013 at 12:36 PM, Mark Grover
<[EMAIL PROTECTED]>wrote:

> Hi Abhishek,
> The code looks incomplete.
>
> See the comment at
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java#L22
> Those are all the methods your UDAF class needs to implement but you seem
> to be missing them.
>
> Mark
>
> On Sat, Feb 9, 2013 at 11:08 PM, Abhishek Bhattacharya <[EMAIL PROTECTED]>wrote:
>
>> Thanks for the response.
>> The link to the code is:
>> https://github.com/Abhishek2301/Hive/blob/master/src/UDAFTopNPercent.java
>> Please let me know to fix it!
>>
>> Thanks,
>> Abhishek
>>
>>
>>
>> On Fri, Feb 8, 2013 at 5:02 PM, Mark Grover <[EMAIL PROTECTED]>wrote:
>>
>>> Abhishek,
>>> The code doesn't seem to be complete.
>>>
>>> Look at
>>> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.javafor reference. It has two terminate()'s - one for UDAF and one for the
>>> Evaluator.
>>>
>>> Do you mind posting your complete code on github somewhere so it's
>>> easier to analyze?
>>>
>>> Mark
>>>
>>> On Fri, Feb 8, 2013 at 2:05 PM, Abhishek Bhattacharya <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi,
>>>>
>>>> I have implemented a simple UDAF for top-n-percent as follows:
>>>> import java.util.ArrayList;
>>>> import java.util.Collections;
>>>>
>>>> import org.apache.hadoop.hive.ql.exec.UDAF;
>>>> import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
>>>>
>>>> public class UDAFTopNPercent extends UDAF{
>>>>
>>>>     public static class Result {
>>>>         ArrayList<Double> list;
>>>>         double min;
>>>>     }
>>>>
>>>>     public class TopNPercentEvaluator implements UDAFEvaluator {
>>>>
>>>>         private Result res;
>>>>         private int rowIndex;
>>>>         private int percent;
>>>>
>>>>         public TopNPercentEvaluator() {
>>>>             super();
>>>>             res = new Result();
>>>>             init();
>>>>             rowIndex = 0;
>>>>         }
>>>>         @Override
>>>>         public void init() {
>>>>             res.list = new ArrayList<Double>();
>>>>             res.min = Double.MAX_VALUE;
>>>>         }
>>>>
>>>>         public boolean iterate(Double rowVal, int pct) {
>>>>             ArrayList<Double> resList = res.list;
>>>>             rowIndex++;
>>>>             resList.add(rowVal);
>>>>             percent = pct;
>>>>             return true;
>>>>         }
>>>>
>>>>         public ArrayList<Double> terminatePartial() {
>>>>             ArrayList<Double> resList = res.list;
>>>>             Collections.sort(resList);
>>>>             return resList;
>>>>         }
>>>>
>>>>         public boolean merge(ArrayList<Double> otherList) {
>>>>             ArrayList<Double> resList = res.list;
>>>>             resList.addAll(otherList);
>>>>             return true;
>>>>         }
>>>>
>>>>         public ArrayList<Double> terminate() {
>>>>             ArrayList<Double> resList = res.list;
>>>>             double num_rows = (double)percent/100.0*rowIndex;
>>>>             Collections.sort(resList);
>>>>             int lastIdx = resList.size()- (int) num_rows;
>>>>             if(lastIdx <= 0) {
>>>>                 return resList;
Thanks and Regards,

Abhishek Bhattacharya
PhD Computer Science
School of Computing and Information Sciences
Florida International University
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB