|
|
-
A bug of auto convert join with intermediate table?Zhong Wang 2013-02-06, 13:28
Hi all,
I am running tests on Hive auto convert join. From the source code, it seems the conditional task will consider the intermediate table size and run the local task for generating hashtable on the intermediate table if it is smaller than the threshold of hive.mapjoin.smalltable.filesize. However, I ran a very simple query based on TPC-H: set hive.auto.convert.join=true; insert overwrite table q3_tmp select c_custkey, o_orderkey, o_orderdate from orders o join customer c on c.c_mktsegment = 'BUILDING' and c.c_custkey = o.o_custkey join lineitem l on l.l_orderkey = o.o_orderkey where c.c_custkey < 1000; The intermediate table of c join o is very small (50KB), which is much less than the threshold. However, both the map joins of the intermediate table and lineitem are filtered by conditional task. Is this a bug of auto convert join or something wrong with my usage/analysis? Zhong |