Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Review Request: float and double calculation is inaccurate in Hive


Copy link to this message
-
Re: Review Request: float and double calculation is inaccurate in Hive
Mark Grover 2012-12-18, 00:38

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8653/#review14625
-----------------------------------------------------------

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java
<https://reviews.apache.org/r/8653/#comment31047>

    10 seems to be a rather arbitrary number for scale. Any particular reason you are using it? Maybe we should invoke the method where no scale needs to be specified.

http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java
<https://reviews.apache.org/r/8653/#comment31048>

    You seem to be doing
    DoubleWritable->String->BigDecimal
    
    There probably is a way to do:
    DoubleWritable->Double->BigDecimal
    
    I am not sure if it's any more efficient the present case. So, take this suggestion with a grain of salt:-)
    
- Mark Grover
On Dec. 18, 2012, 12:37 a.m., Johnny Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8653/
> -----------------------------------------------------------
>
> (Updated Dec. 18, 2012, 12:37 a.m.)
>
>
> Review request for hive.
>
>
> Description
> -------
>
> I found this during debug the e2e test failures. I found Hive miss calculate the float and double value. Take float calculation as an example:
> hive> select f from all100k limit 1;
> 48308.98
> hive> select f/10 from all100k limit 1;
> 4830.898046875 <--added 04875 in the end
> hive> select f*1.01 from all100k limit 1;
> 48792.0702734375 <--should be 48792.0698
> It might be essentially the same problem as http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm But since e2e test compare the results with mysql and seems mysql does it right, so it is worthy fixing it in Hive.
>
>
> This addresses bug HIVE-3715.
>     https://issues.apache.org/jira/browse/HIVE-3715
>
>
> Diffs
> -----
>
>   http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java 1423224
>   http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java 1423224
>
> Diff: https://reviews.apache.org/r/8653/diff/
>
>
> Testing
> -------
>
> I did test to compare the result with mysql default float precision setting, the result is identical.
>
> query:          select f, f*1.01, f/10 from all100k limit 1;
> mysql result:   48309       48792.0702734375    4830.898046875
> hive result:    48308.98    48792.0702734375 4830.898046875
>
>
> I apply this patch and run the hive e2e test, and the tests all pass (without this patch, 5 related failures)
>
>
> Thanks,
>
> Johnny Zhang
>
>