Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Effecient partitions usage in join


Copy link to this message
-
Effecient partitions usage in join
Hi Guys,

I wonder if you could help me.

I have a huge Hive table partitioned by some field. It has thousands of partitions.
Now I have another small table containing tens of partitions id. I'd like to get the data only from those partitions.

However when I run
Select * from A join B on (A.partition_id = B.partition_id),
It reads all data from A, then from B and on reduce stage performs join.

I tried /*+ MAPJOIN*/ it ran faster sparing reduce operation, but still read the whole A table.

Is there a more efficient way to perform the query w/o reading the whole A content?
Thanks
Dima
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB