Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Effecient partitions usage in join


Copy link to this message
-
Effecient partitions usage in join
Dima Datsenko 2012-11-22, 13:56
Hi Guys,

I wonder if you could help me.

I have a huge Hive table partitioned by some field. It has thousands of partitions.
Now I have another small table containing tens of partitions id. I'd like to get the data only from those partitions.

However when I run
Select * from A join B on (A.partition_id = B.partition_id),
It reads all data from A, then from B and on reduce stage performs join.

I tried /*+ MAPJOIN*/ it ran faster sparing reduce operation, but still read the whole A table.

Is there a more efficient way to perform the query w/o reading the whole A content?
Thanks
Dima