-Re: Map side join
bejoy_ks@... 2012-12-13, 18:06
To have the new hdfs block size in effect on the already existing files, you need to re copy them into hdfs.
To play with the number of mappers you can set lesser value like 64mb for min and max split size.
Mapred.min.split.size and mapred.max.split.size
Sent from remote device, Please excuse typos
From: Souvik Banerjee <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2012 12:00:16
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Subject: Re: Map side join
The input files are non-compressed text file.
There are enough free slots in the cluster.
Can you please let me know can I increase the no of mappers?
I tried reducing the HDFS block size to 32 MB from 128 MB. I was expecting
to get more mappers. But still it's launching same no of mappers like it
was doing while the HDFS block size was 128 MB. I have enough map slots
available, but not being able to utilize those.
Thanks and regards,
On Thu, Dec 13, 2012 at 11:12 AM, <[EMAIL PROTECTED]> wrote:
> Hi Souvik
> Is your input files compressed using some non splittable compression codec?
> Do you have enough free slots while this job is running?
> Make sure that the job is not running locally.
> Bejoy KS
> Sent from remote device, Please excuse typos
> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
> *Date: *Wed, 12 Dec 2012 14:27:27 -0600
> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> *ReplyTo: * [EMAIL PROTECTED]
> *Subject: *Re: Map side join
> Hi Bejoy,
> Yes I ran the pi example. It was fine.
> Regarding the HIVE Job what I found is that it took 4 hrs for the first
> map job to get completed.
> Those map tasks were doing their job and only reported status after
> completion. It is indeed taking too long time to finish. Nothing I could
> find relevant in the logs.
> Thanks and regards,
> On Wed, Dec 12, 2012 at 8:04 AM, <[EMAIL PROTECTED]> wrote:
>> Hi Souvik
>> Apart from hive jobs is the normal mapreduce jobs like the wordcount
>> running fine on your cluster?
>> If it is working, for the hive jobs are you seeing anything skeptical in
>> task, Tasktracker or jobtracker logs?
>> Bejoy KS
>> Sent from remote device, Please excuse typos
>> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
>> *Date: *Tue, 11 Dec 2012 17:12:20 -0600
>> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
>> *ReplyTo: * [EMAIL PROTECTED]
>> *Subject: *Re: Map side join
>> Hello Everybody,
>> Need help in for on HIVE join. As we were talking about the Map side join
>> I tried that.
>> I set the flag set hive.auto.convert.join=true;
>> I saw Hive converts the same to map join while launching the job. But the
>> problem is that none of the map job progresses in my case. I made the
>> dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
>> done very quickly.
>> No luck with any change of settings.
>> Failing to progress with the default setting changes these settings.
>> set hive.mapred.local.mem=1024; // Initially it was 216 I guess
>> set hive.join.cache.size=100000; // Initialliu it was 25000
>> Also on Hadoop side I made this changes
>> mapred.child.java.opts -Xmx1073741824
>> But I don't see any progress. After more than 40 minutes of run I am at
>> 0% map completion state.
>> Can you please throw some light on this?
>> Thanks a lot once again.
>> On Fri, Dec 7, 2012 at 2:32 PM, Souvik Banerjee <[EMAIL PROTECTED]
>> > wrote:
>>> Hi Bejoy,
>>> That's wonderful. Thanks for your reply.
>>> What I was wondering if HIVE can do map side join with more than one
>>> condition on JOIN clause.
>>> I'll simply try it out and post the result.
>>> Thanks once again.
>>> On Fri, Dec 7, 2012 at 2:10 PM, <[EMAIL PROTECTED]> wrote: