Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - A bug of auto convert join with intermediate table?


Copy link to this message
-
Re: A bug of auto convert join with intermediate table?
Zhong Wang 2013-02-07, 07:24
Hello Abdelrhman,

I am using the Hive version 0.8.1 which includes this fixing. I think this
is an integer overflow bug of ConditionalResolverCommonJoin's inner class
AliasFileSizePair. The compareTo() method may overflow:

    public int compareTo(AliasFileSizePair o) {
      if (o == null) {
        return 1;
      }
      return (int)(size - o.size);
    }

because size and o.size are long integers. Can anyone confirm this?

Zhong

On Thu, Feb 7, 2013 at 4:40 AM, Abdelrhman Shettia <[EMAIL PROTECTED]
> wrote:

> Hi Zhong,
>
> It is possible that you are facing the following hive bug? You may want to
> upgrade the current hive client.
>
>
> https://issues.apache.org/jira/browse/HIVE-2095
>
>
> Thanks
> -Abdelrhman
>
>
> Hortonworks, Inc.
> Technical Support Engineer
> Abdelrahman Shettia
> [EMAIL PROTECTED]
> Office phone: (708) 689-9609
> How am I doing?   Please feel free to provide feedback to my manager Rick Morris
> at [EMAIL PROTECTED]
>
>
> On Feb 6, 2013, at 5:28 AM, Zhong Wang <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> I am running tests on Hive auto convert join. From the source code, it
> seems the conditional task will consider the intermediate table size and
> run the local task for generating hashtable on the intermediate table if it
> is smaller than the threshold of hive.mapjoin.smalltable.filesize. However,
> I ran a very simple query based on TPC-H:
>
> set hive.auto.convert.join=true;
>
> insert overwrite table q3_tmp
> select c_custkey, o_orderkey, o_orderdate
> from orders o join customer c on c.c_mktsegment = 'BUILDING' and
> c.c_custkey = o.o_custkey
> join lineitem l on l.l_orderkey = o.o_orderkey
> where c.c_custkey < 1000;
>
> The intermediate table of c join o is very small (50KB), which is much
> less than the threshold. However, both the map joins of the intermediate
> table and lineitem are filtered by conditional task. Is this a bug of auto
> convert join or something wrong with my usage/analysis?
>
> Zhong
>
>
>