Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Map side join


Copy link to this message
-
Re: Map side join
Hi Souvik

Apart from hive jobs is the normal mapreduce jobs like the wordcount running fine on your cluster?

If it is working, for the hive jobs are you seeing anything skeptical in task, Tasktracker or jobtracker logs?
Regards
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Souvik Banerjee <[EMAIL PROTECTED]>
Date: Tue, 11 Dec 2012 17:12:20
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: Map side join

Hello Everybody,

Need help in for on HIVE join. As we were talking about the Map side join I
tried that.
I set the flag set hive.auto.convert.join=true;

I saw Hive converts the same to map join while launching the job. But the
problem is that none of the map job progresses in my case. I made the
dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
done very quickly.
No luck with any change of settings.
Failing to progress with the default setting changes these settings.
set hive.mapred.local.mem=1024; // Initially it was 216 I guess
set hive.join.cache.size=100000; // Initialliu it was 25000

Also on Hadoop side I made this changes

mapred.child.java.opts -Xmx1073741824

But I don't see any progress. After more than 40 minutes of run I am at 0%
map completion state.
Can you please throw some light on this?

Thanks a lot once again.

Regards,
Souvik.

On Fri, Dec 7, 2012 at 2:32 PM, Souvik Banerjee <[EMAIL PROTECTED]>wrote:

> Hi Bejoy,
>
> That's wonderful. Thanks for your reply.
> What I was wondering if HIVE can do map side join with more than one
> condition on JOIN clause.
> I'll simply try it out and post the result.
>
> Thanks once again.
>
> Regards,
> Souvik.
>
>  On Fri, Dec 7, 2012 at 2:10 PM, <[EMAIL PROTECTED]> wrote:
>
>> **
>> Hi Souvik
>>
>> In earlier versions of hive you had to give the map join hint. But in
>> later versions just set hive.auto.convert.join = true;
>> Hive automatically selects the smaller table. It is better to give the
>> smaller table as the first one in join.
>>
>> You can use a map join if you are joining a small table with a large one,
>> in terms of data size. By small, better to have the smaller table size in
>> range of MBs.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: *Souvik Banerjee <[EMAIL PROTECTED]>
>> *Date: *Fri, 7 Dec 2012 13:58:25 -0600
>> *To: *<[EMAIL PROTECTED]>
>> *ReplyTo: *[EMAIL PROTECTED]
>> *Subject: *Map side join
>>
>> Hello everybody,
>>
>> I have got a question. I didn't came across any post which says somethign
>> about this.
>> I have got two tables. Lets say A and B.
>> I want to join A & B in HIVE. I am currently using HIVE 0.9 version.
>> The join would be on few columns. like on (A.id1 = B.id1) AND (A.id2 >> B.id2) AND (A.id3 = B.id3)
>>
>> Can I ask HIVE to use map side join in this scenario? Should I give a
>> hint to HIVE by saying /*+mapjoin(B)*/
>>
>> Get back to me if you want any more information in this regard.
>>
>> Thanks and regards,
>> Souvik.
>>
>
>