Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Effecient partitions usage in join


Copy link to this message
-
Effecient partitions usage in join
Hi Guys,

I wonder if you could help me.

I have a huge Hive table partitioned by some field. It has thousands of partitions.
Now I have another small table containing tens of partitions id. I'd like to get the data only from those partitions.

However when I run
Select * from A join B on (A.partition_id = B.partition_id),
It reads all data from A, then from B and on reduce stage performs join.

I tried /*+ MAPJOIN*/ it ran faster sparing reduce operation, but still read the whole A table.

Is there a more efficient way to perform the query w/o reading the whole A content?
Thanks
Dima
+
Bennie Schut 2012-11-22, 14:27
+
Dima Datsenko 2012-11-22, 15:06
+
Bennie Schut 2012-11-23, 10:50
+
Dean Wampler 2012-11-23, 13:40
+
Dima Datsenko 2012-11-23, 13:46