Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> A bug of auto convert join with intermediate table?


+
Zhong Wang 2013-02-06, 13:28
Copy link to this message
-
Re: A bug of auto convert join with intermediate table?
Hi Zhong,

It is possible that you are facing the following hive bug? You may want to upgrade the current hive client.  
https://issues.apache.org/jira/browse/HIVE-2095
Thanks
-Abdelrhman
Hortonworks, Inc.
Technical Support Engineer
Abdelrahman Shettia
[EMAIL PROTECTED]
Office phone: (708) 689-9609
How am I doing?   Please feel free to provide feedback to my manager Rick Morris at [EMAIL PROTECTED]
On Feb 6, 2013, at 5:28 AM, Zhong Wang <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I am running tests on Hive auto convert join. From the source code, it seems the conditional task will consider the intermediate table size and run the local task for generating hashtable on the intermediate table if it is smaller than the threshold of hive.mapjoin.smalltable.filesize. However, I ran a very simple query based on TPC-H:
>
> set hive.auto.convert.join=true;
>
> insert overwrite table q3_tmp
> select c_custkey, o_orderkey, o_orderdate
> from orders o join customer c on c.c_mktsegment = 'BUILDING' and
> c.c_custkey = o.o_custkey
> join lineitem l on l.l_orderkey = o.o_orderkey
> where c.c_custkey < 1000;
>
> The intermediate table of c join o is very small (50KB), which is much less than the threshold. However, both the map joins of the intermediate table and lineitem are filtered by conditional task. Is this a bug of auto convert join or something wrong with my usage/analysis?
>
> Zhong

+
Zhong Wang 2013-02-07, 07:24
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB