Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> A bug of auto convert join with intermediate table?

Copy link to this message
A bug of auto convert join with intermediate table?
Hi all,

I am running tests on Hive auto convert join. From the source code, it
seems the conditional task will consider the intermediate table size and
run the local task for generating hashtable on the intermediate table if it
is smaller than the threshold of hive.mapjoin.smalltable.filesize. However,
I ran a very simple query based on TPC-H:

set hive.auto.convert.join=true;

insert overwrite table q3_tmp
select c_custkey, o_orderkey, o_orderdate
from orders o join customer c on c.c_mktsegment = 'BUILDING' and
c.c_custkey = o.o_custkey
join lineitem l on l.l_orderkey = o.o_orderkey
where c.c_custkey < 1000;

The intermediate table of c join o is very small (50KB), which is much less
than the threshold. However, both the map joins of the intermediate table
and lineitem are filtered by conditional task. Is this a bug of auto
convert join or something wrong with my usage/analysis?