Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive double-precision question


Copy link to this message
-
Re: Hive double-precision question
Hi Mark,
   Thanks for the pointers. I looked at the code and it looks like my Java
code and the Hive code are similar...(I am a basic-level Java guy). The UDF
below uses Math.sin....which is what I used to test "linux + Java" result.
I have to see what this DoubleWritable and Serde2 is all about...

package org.apache.hadoop.hive.ql.udf;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.serde2.io.DoubleWritable;

/**
* UDFSin.
*
*/
@Description(name = "sin",
    value = "_FUNC_(x) - returns the sine of x (x is in radians)",
    extended = "Example:\n "
    + " > SELECT _FUNC_(0) FROM src LIMIT 1;\n" + " 0")
public class UDFSin extends UDF {
  private DoubleWritable result = new DoubleWritable();

  public UDFSin() {
  }

public DoubleWritable evaluate(DoubleWritable a) {
    if (a == null) {
      return null;
    } else {
      result.set(Math.sin(a.get()));
      return result;
    }
  }
}

On Fri, Dec 7, 2012 at 2:02 PM, Mark Grover <[EMAIL PROTECTED]>wrote:

> Periya:
> If you want to see what the built in Hive UDFs are doing, the code is here:
>
> https://github.com/apache/hive/tree/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic
> and
>
> https://github.com/apache/hive/tree/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf
>
> You can find out which UDF name maps to what class by looking at
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
>
> If my memory serves me right, there was some "interesting" stuff Hive does
> when mapping Java types to Hive datatypes. I am not sure how relevant it is
> to this discussion but I will have to look further to comment more.
>
> In the meanwhile take a look at the UDF code and see if your personal Java
> code on Linux is equivalent to the Hive UDF code.
>
> Keep us posted!
> Mark
>
> On Fri, Dec 7, 2012 at 1:27 PM, Periya.Data <[EMAIL PROTECTED]> wrote:
>
>> Hi Hive Users,
>>     I recently noticed an interesting behavior with Hive and I am unable
>> to find the reason for it. Your insights into this is much appreciated.
>>
>> I am trying to compute the distance between two zip codes. I have the
>> distances computed in various 'platforms' - SAS, R, Linux+Java, Hive UDF
>> and using Hive's built-in functions. There are some discrepancies from the
>> 3rd decimal place when I see the output got from using Hive UDF and Hive's
>> built-in functions. Here is an example:
>>
>> zip1          zip 2          Hadoop Built-in function
>> SAS                      R                                       Linux +
>> Java
>> 00501   11720   4.49493083698542000 4.49508858 4.49508858054005
>> 4.49508857976933000
>> The formula used to compute distance is this (UDF):
>>
>>         double long1 = Math.atan(1)/45 * ux;
>>         double lat1 = Math.atan(1)/45 * uy;
>>         double long2 = Math.atan(1)/45 * mx;
>>         double lat2 = Math.atan(1)/45 * my;
>>
>>         double X1 = long1;
>>         double Y1 = lat1;
>>         double X2 = long2;
>>         double Y2 = lat2;
>>
>>         double distance = 3949.99 * Math.acos(Math.sin(Y1) *
>>                 Math.sin(Y2) + Math.cos(Y1) * Math.cos(Y2) * Math.cos(X1
>> - X2));
>>
>>
>> The one used using built-in functions (same as above):
>> 3949.99*acos(  sin(u_y_coord * (atan(1)/45 )) *
>>         sin(m_y_coord * (atan(1)/45 )) + cos(u_y_coord * (atan(1)/45 ))*
>>         cos(m_y_coord * (atan(1)/45 ))*cos(u_x_coord *
>>         (atan(1)/45) - m_x_coord * (atan(1)/45)) )
>>
>>
>>
>>
>> - The Hive's built-in functions used are acos, sin, cos and atan.
>> - for another try, I used Hive UDF, with Java's math library (Math.acos,
>> Math.atan etc)
>> - All variables used are double.
>>
>> I expected the value from Hadoop UDF (and Built-in functions) to be
>> identical with that got from plain Java code in Linux. But they are not.
>> The built-in function (as well as UDF) gives 49493083698542000 whereas