Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Question on bucketed map join


+
Avrilia Floratou 2012-01-19, 15:53
+
Bejoy Ks 2012-01-19, 17:22
Copy link to this message
-
Re: Question on bucketed map join
Corrected a few typos in previous mail

Hi Avrila
Hi Avrila
       AFAIK the bucketed map join is not default in hive and it happens only when the configuration parameter hive.optimize.bucketmapjoin  is set to true. You may be getting the same execution plan because hive.optimize.bucketmapjoin  is set to true  in the hive configuration xml file. To cross confirm the same could you explicitly set this to false
(set hive.optimize.bucketmapjoin = false;
) in your hive session and get the query execution plan from explain command.
Please find some pointers in line
1. Should I see sth different in the explain extended output if I set and unset the hive.optimize.bucketmapjoin option?
[Bejoy]Yes, you should be seeing different plans for both.
Try EXPLAIN your join query after setting this
set hive.optimize.bucketmapjoin = false;

2. Should I see something different in the output of hive while running the query if again I set and unset the hive.optimize.bucketmapjoin?
[Bejoy] No,Hive output should be the same. What ever is the execution plan for an join, optimally the end result should be same.

3. Is it possible that even though I set bucketmapjoin to true, Hive will still perform a normal map-side join for some reason? How can I check if this has actually happened?
[Bejoy] Hive would perform a plain map side join only if the following parameter is enabled. (default it is disabled)
set hive.auto.convert.join = true; you need to check this value in your configurations.
If it is enabled irrespective of the table size hive would always try a map join, it would come to a normal join only after the map join attempt fails.
AFAIK, if the number of buckets are same or multiples between the two tables involved in a join and if the join is on the same columns that are bucketed, with bucketmapjoin enabled it shouldn't execute a plain mapside join but a bucketed map side join would be triggered.

Hope it helps!..
Regards
Bejoy K S

-----Original Message-----
From: Bejoy Ks <[EMAIL PROTECTED]>
Date: Thu, 19 Jan 2012 09:22:08
To: [EMAIL PROTECTED]<[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: Question on bucketed map join

Hi Avrila
       AFAIK the bucketed map join is not default in hive and it happens only when the values is set to true. It could be because the same value is already set in the hive configuration xml file. To cross confirm the same could you explicitly set this to false

(set hive.optimize.bucketmapjoin = false;)and get the query execution plan from explain command.
Please some pointers in line

1. Should I see sth different in the explain extended output if I set and unset the hive.optimize.bucketmapjoin option?
[Bejoy] you should be seeing the same
Try EXPLAIN your join query after setting this
set hive.optimize.bucketmapjoin = false;
2. Should I see something different in the output of hive while running
the query if again I set and unset the hive.optimize.bucketmapjoin?
[Bejoy] No,Hive output should be the same. What ever is the execution plan for an join, optimally the end result should be same.
3.
 Is it possible that even though I set bucketmapjoin to true, Hive will
still perform a normal map-side join for some reason? How can I check if
 this has actually happened?
[Bejoy] Hive would perform a plain map side join only if the following parameter is enabled. (default it is disabled)

set hive.auto.convert.join = true; you need to check this value in your configurations.
If it is enabled irrespective of the table size hive would always try a map join, it would come to a normal join only after the map join attempt fails.
AFAIK, if the number of buckets are same or multiples between the two tables involved in a join and if the join is on the same columns that are bucketed, with bucketmapjoin enabled it shouldn't execute a plain mapside join a bucketed map side join would be triggered.

Hope it helps!..

Regards
Bejoy.K.S

________________________________
 From: Avrilia Floratou <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, January 19, 2012 9:23 PM
Subject: Question on bucketed map join
 
Hi,

I have two tables with 8 buckets each on the same key and want to join them.
I ran "explain extended" and get the plan produced by HIVE which shows that a map-side join is a possible plan.

I then set in my script the hive.optimize.bucketmapjoin option to true and reran the "explain extended" query. I get the exact same plans as output.

I ran the query with and without the bucketmapjoin optimization and saw no difference in the running time.

I have the following questions:

1. Should I see sth different in the explain extended output if I set and unset the hive.optimize.bucketmapjoin option?

2. Should I see something different in the output of hive while running the query if again I set and unset the hive.optimize.bucketmapjoin?

3. Is it possible that even though I set bucketmapjoin to true, Hive will still perform a normal map-side join for some reason? How can I check if this has actually happened?

Thanks,
Avrilia
+
Avrilia Floratou 2012-01-24, 17:09
+
Bejoy Ks 2012-01-25, 17:28
+
Amit Sharma 2012-03-27, 00:54
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB