This is my understanding of both. Wait for the hive guru's to correct me if
i made any mistake
In Hive, when an inner join query happens the table at the last position on
the right streams its records to the reducers. This is the default
So say, you have a query select blah blah from t1 join t2 join t3 join t4
on (blah blah)
all the maps emitting key values on table t1, t2, t3 just send it to
reducers and are bufferred in memory but for table t4 it streams the
records to the reducer for better memory management and thats why its
advised that you have largest table on the right
This default behavior is changed by STREAMTABLE(t1) where you can tell
which table data you want to be streamed.
On the other hand, mapjoin is a concept where there are no reducers are
involved. Its a join where the smaller table is buffered into memory of
each map and then the joins are performed by the maps itself. As the
smaller table data is available in memory, map jobs are very fast as the
reduce step is completely removed.
On Tue, Dec 3, 2013 at 2:47 PM, Baahu <[EMAIL PROTECTED]> wrote:
> What is the difference between hints STREAMTABLE ,MAPJOIN .