Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive double-precision question


Copy link to this message
-
Re: Hive double-precision question
Hi Mark,
   Thanks for the pointers. I looked at the code and it looks like my Java
code and the Hive code are similar...(I am a basic-level Java guy). The UDF
below uses Math.sin....which is what I used to test "linux + Java" result.
I have to see what this DoubleWritable and Serde2 is all about...

package org.apache.hadoop.hive.ql.udf;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.serde2.io.DoubleWritable;

/**
* UDFSin.
*
*/
@Description(name = "sin",
    value = "_FUNC_(x) - returns the sine of x (x is in radians)",
    extended = "Example:\n "
    + " > SELECT _FUNC_(0) FROM src LIMIT 1;\n" + " 0")
public class UDFSin extends UDF {
  private DoubleWritable result = new DoubleWritable();

  public UDFSin() {
  }

public DoubleWritable evaluate(DoubleWritable a) {
    if (a == null) {
      return null;
    } else {
      result.set(Math.sin(a.get()));
      return result;
    }
  }
}

On Fri, Dec 7, 2012 at 2:02 PM, Mark Grover <[EMAIL PROTECTED]>wrote:

> Periya:
> If you want to see what the built in Hive UDFs are doing, the code is here:
>
> https://github.com/apache/hive/tree/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic
> and
>
> https://github.com/apache/hive/tree/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf
>
> You can find out which UDF name maps to what class by looking at
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
>
> If my memory serves me right, there was some "interesting" stuff Hive does
> when mapping Java types to Hive datatypes. I am not sure how relevant it is
> to this discussion but I will have to look further to comment more.
>
> In the meanwhile take a look at the UDF code and see if your personal Java
> code on Linux is equivalent to the Hive UDF code.
>
> Keep us posted!
> Mark
>
> On Fri, Dec 7, 2012 at 1:27 PM, Periya.Data <[EMAIL PROTECTED]> wrote:
>
>> Hi Hive Users,
>>     I recently noticed an interesting behavior with Hive and I am unable
>> to find the reason for it. Your insights into this is much appreciated.
>>
>> I am trying to compute the distance between two zip codes. I have the
>> distances computed in various 'platforms' - SAS, R, Linux+Java, Hive UDF
>> and using Hive's built-in functions. There are some discrepancies from the
>> 3rd decimal place when I see the output got from using Hive UDF and Hive's
>> built-in functions. Here is an example:
>>
>> zip1          zip 2          Hadoop Built-in function
>> SAS                      R                                       Linux +
>> Java
>> 00501   11720   4.49493083698542000 4.49508858 4.49508858054005
>> 4.49508857976933000
>> The formula used to compute distance is this (UDF):
>>
>>         double long1 = Math.atan(1)/45 * ux;
>>         double lat1 = Math.atan(1)/45 * uy;
>>         double long2 = Math.atan(1)/45 * mx;
>>         double lat2 = Math.atan(1)/45 * my;
>>
>>         double X1 = long1;
>>         double Y1 = lat1;
>>         double X2 = long2;
>>         double Y2 = lat2;
>>
>>         double distance = 3949.99 * Math.acos(Math.sin(Y1) *
>>                 Math.sin(Y2) + Math.cos(Y1) * Math.cos(Y2) * Math.cos(X1
>> - X2));
>>
>>
>> The one used using built-in functions (same as above):
>> 3949.99*acos(  sin(u_y_coord * (atan(1)/45 )) *
>>         sin(m_y_coord * (atan(1)/45 )) + cos(u_y_coord * (atan(1)/45 ))*
>>         cos(m_y_coord * (atan(1)/45 ))*cos(u_x_coord *
>>         (atan(1)/45) - m_x_coord * (atan(1)/45)) )
>>
>>
>>
>>
>> - The Hive's built-in functions used are acos, sin, cos and atan.
>> - for another try, I used Hive UDF, with Java's math library (Math.acos,
>> Math.atan etc)
>> - All variables used are double.
>>
>> I expected the value from Hadoop UDF (and Built-in functions) to be
>> identical with that got from plain Java code in Linux. But they are not.
>> The built-in function (as well as UDF) gives 49493083698542000 whereas
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB