Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> UDAF terminatePartial structure


Copy link to this message
-
Re: UDAF terminatePartial structure
There are limitations as to what can be passed between terminatePartial() and merge() I'm not sure that you can pass java arrays (i.e. your double[] c1;) through all the hive reflection gubbins.  Try using ArrayList<>s instead, but be warned, you need to make explicit deep copies of anything passed in to merge().

Robin

From: Ritesh Agrawal <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, July 29, 2013 9:12 PM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: UDAF terminatePartial structure

Hi Robin,igor

Thanks for the suggestion and links. Based on examples I found, below is my UDF. However, I am getting following error when trying to run it. Not sure what the error means

============= ERROR ===================FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.NoSuchMethodException: [D.<init>())
java.lang.RuntimeException: java.lang.NoSuchMethodException: [D.<init>()
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at org.apache.hadoop.hive.serde2.objectinspector.ReflectionStructObjectInspector.create(ReflectionStructObjectInspector.java:170)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:225)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:127)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:221)
        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:127)
============= UDF CODE =================package com.netflix.hive.udaf;

import java.io.IOException;
import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Set;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
import org.apache.hadoop.hive.serde2.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;

@Description(
name = "MFFoldIn",
value = "_FUNC_(expr, nb) - Computes latent features for a given item/user based user/item vectors",
extended = "Example:\n"
)

public class MFFoldIn extends UDAF {
publicstatic class MFFoldInEvaluator implements UDAFEvaluator{
publicstatic class PartialResult{
double[] c1;
double[][] c2;
double[][] c3;
doublewm;
doublelambda;
intitemCount;
double[][] varco;
Set<Long> observedShows;
publicint getDimensionsCount() throws Exception{
if(c1 != null) returnc1.length;
thrownew Exception("Unknown dimension count");
}
}
private UserVecBuilder builder;
publicvoid init() {
builder = null;
}
publicboolean iterate(DoubleWritable wm, DoubleWritable lambda,
IntWritable itemCount, String itemSquaredFile,
DoubleWritable weight, List<Double> lf,
Long item) throws IOException{
double[] lflist = new double[lf.size()];
for(int i=0; i<lf.size(); i++)
lflist[i] = lf.get(i).doubleValue();
if(builder == null) builder = new UserVecBuilder();
if(!builder.isReady()){
builder.setW_m(wm.get());
builder.setLambda(lambda.get());
builder.setItemRowCount(itemCount.get());
builder.readItemCovarianceMatFiles(itemSquaredFile, lflist.length);
}

builder.add(item, lflist, weight.get());
returntrue;
}
public PartialResult terminatePartial(){
PartialResult partial = new PartialResult();
partial.c1 = builder.getComponent1();
partial.c2 = builder.getComponent2();
partial.c3 = builder.getComponent3();
partial.wm = builder.getW_m();
partial.lambda = builder.getLambda();
partial.observedShows = builder.getObservedShows();
partial.itemCount = builder.getItemRowCount();
partial.varco = builder.getVarCovar();
return partial;
}
publicboolean merge(PartialResult other){
if(other == null) returntrue;
if(builder == null) builder = new UserVecBuilder();
if(!builder.isReady()){
builder.setW_m(other.wm);
builder.setLambda(other.lambda);
builder.setItemRowCount(other.itemCount);
builder.setItemCovarianceMat(other.varco);
builder.setComponent1(other.c1);
builder.setComponent2(other.c2);
builder.setComponent3(other.c3);
builder.setObservedShows(other.observedShows);
}else{
builder.merge(other.c1, other.c2, other.c3, other.observedShows);
}
returntrue;
}
publicdouble[] terminate(){
if(builder == null) returnnull;
  returnbuilder.build();
}
}

}
===================On Jul 29, 2013, at 4:37 PM, Igor Tatarinov wrote:

I found this Cloudera example helpful:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-contrib/0.7.0-cdh3u0/org/apache/hadoop/hive/contrib/udaf/example/UDAFExampleMaxMinNUtil.java#UDAFExampleMaxMinNUtil.Evaluator

igor
decide.com<http://decide.com/>

On Mon, Jul 29, 2013 at 4:32 PM, Ritesh Agrawal <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Robin,

Thanks for the suggestion. I did find such an example in Hadoop The definitive guide book. However I am not total confused.

The book extends UDAF instead of AbstractGenericUDAFResolver. Which one is recommended ?

Also the example in the book uses DoubleWritable as a return type for the "terminate" function. However, I will be returning an arraylist of double. Do I always need to written objects that are derived from WritableComponents.

Ritesh
On Jul 29, 2013, at 4:15 PM, Robin Morris wrote:

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB