Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Trouble with sum function


Copy link to this message
-
Re: Trouble with sum function
>From the code here

http://svn.apache.org/viewvc/hive/branches/branch-0.7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java?view=markup

For float , doble and string the implementation points to common function
GenericUDAFSumDouble()

if (parameters[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
54     throw new UDFArgumentTypeException(0,
55     "Only primitive type arguments are accepted but "
56     + parameters[0].getTypeName() + " is passed.");
57     }
58     switch (((PrimitiveTypeInfo) parameters[0]).getPrimitiveCategory()) {
59     case BYTE:
60     case SHORT:
61     case INT:
62     case LONG:
63     return new GenericUDAFSumLong();
*64  *   *case FLOAT:
65     case DOUBLE:
66     case STRING:
67     return new GenericUDAFSumDouble();*

On Mon, Jun 11, 2012 at 3:43 PM, Gabi D <[EMAIL PROTECTED]> wrote:

> float is known to have precision issues, because of the way it is
> implemented. If you are working with money data you should definitely move
> to double.
> google 'float precision' and you'll find a bunch of explanations.
>
>
> On Mon, Jun 11, 2012 at 12:49 PM, Guillaume Polaert <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> We're expecting some issue with the sum function in Hive 0.7.1.
>> The precision of float number isn't correct (0.320484484676 instead of
>> 0.32)
>> We aren't expecting this error with double format.
>>
>> For instance, "select id, sum(col1), sum(col2) from test_table group by
>> id" returns incorrect values.
>>
>> ------
>> CREATE TABLE test_table  ( id int,  col1 double,  col2 float )
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
>> LINES TERMINATED BY '\n'
>> STORED AS TEXTFILE;
>>
>> LOAD DATA INPATH '/user/hive/xxxxx/import.csv' INTO TABLE `test _table`
>>
>> Import.csv
>> 1;1.012;1.012
>> 2;2.1;2.1
>> 3;3.2;3.2
>> 4;4.323;4.323
>> 5;5;5
>> ----
>>
>> Is it a bug  ? Or maybe did we make a error ?
>>
>>
>> Guillaume Polaert | Cyrès Conseil
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB