Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Map side join

Copy link to this message
Re: Map side join
Hi Souvik

Is your input files compressed using some non splittable compression codec?

Do you have enough free slots while this job is running?

Make sure that the job is not running locally.

Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Souvik Banerjee <[EMAIL PROTECTED]>
Date: Wed, 12 Dec 2012 14:27:27
Subject: Re: Map side join

Hi Bejoy,

Yes I ran the pi example. It was fine.
Regarding the HIVE Job what I found is that it took 4 hrs for the first map
job to get completed.
Those map tasks were doing their job and only reported status after
completion. It is indeed taking too long time to finish. Nothing I could
find relevant in the logs.

Thanks and regards,

On Wed, Dec 12, 2012 at 8:04 AM, <[EMAIL PROTECTED]> wrote:

> **
> Hi Souvik
> Apart from hive jobs is the normal mapreduce jobs like the wordcount
> running fine on your cluster?
> If it is working, for the hive jobs are you seeing anything skeptical in
> task, Tasktracker or jobtracker logs?
> Regards
> Bejoy KS
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
> *Date: *Tue, 11 Dec 2012 17:12:20 -0600
> *Subject: *Re: Map side join
> Hello Everybody,
> Need help in for on HIVE join. As we were talking about the Map side join
> I tried that.
> I set the flag set hive.auto.convert.join=true;
> I saw Hive converts the same to map join while launching the job. But the
> problem is that none of the map job progresses in my case. I made the
> dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
> done very quickly.
> No luck with any change of settings.
> Failing to progress with the default setting changes these settings.
> set hive.mapred.local.mem=1024; // Initially it was 216 I guess
> set hive.join.cache.size=100000; // Initialliu it was 25000
> Also on Hadoop side I made this changes
> mapred.child.java.opts -Xmx1073741824
> But I don't see any progress. After more than 40 minutes of run I am at 0%
> map completion state.
> Can you please throw some light on this?
> Thanks a lot once again.
> Regards,
> Souvik.
> On Fri, Dec 7, 2012 at 2:32 PM, Souvik Banerjee <[EMAIL PROTECTED]>wrote:
>> Hi Bejoy,
>> That's wonderful. Thanks for your reply.
>> What I was wondering if HIVE can do map side join with more than one
>> condition on JOIN clause.
>> I'll simply try it out and post the result.
>> Thanks once again.
>> Regards,
>> Souvik.
>>  On Fri, Dec 7, 2012 at 2:10 PM, <[EMAIL PROTECTED]> wrote:
>>> **
>>> Hi Souvik
>>> In earlier versions of hive you had to give the map join hint. But in
>>> later versions just set hive.auto.convert.join = true;
>>> Hive automatically selects the smaller table. It is better to give the
>>> smaller table as the first one in join.
>>> You can use a map join if you are joining a small table with a large
>>> one, in terms of data size. By small, better to have the smaller table size
>>> in range of MBs.
>>> Regards
>>> Bejoy KS
>>> Sent from remote device, Please excuse typos
>>> ------------------------------
>>> *From: *Souvik Banerjee <[EMAIL PROTECTED]>
>>> *Date: *Fri, 7 Dec 2012 13:58:25 -0600
>>> *ReplyTo: *[EMAIL PROTECTED]
>>> *Subject: *Map side join
>>> Hello everybody,
>>> I have got a question. I didn't came across any post which says
>>> somethign about this.
>>> I have got two tables. Lets say A and B.
>>> I want to join A & B in HIVE. I am currently using HIVE 0.9 version.
>>> The join would be on few columns. like on (A.id1 = B.id1) AND (A.id2 >>> B.id2) AND (A.id3 = B.id3)
>>> Can I ask HIVE to use map side join in this scenario? Should I give a
>>> hint to HIVE by saying /*+mapjoin(B)*/