Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Map side join


+
Souvik Banerjee 2012-12-07, 19:58
+
bejoy_ks@... 2012-12-07, 20:10
+
Souvik Banerjee 2012-12-07, 20:32
+
Souvik Banerjee 2012-12-11, 23:12
+
bejoy_ks@... 2012-12-12, 14:04
+
Souvik Banerjee 2012-12-12, 20:27
+
bejoy_ks@... 2012-12-13, 17:12
+
Souvik Banerjee 2012-12-13, 18:00
+
bejoy_ks@... 2012-12-13, 18:06
Copy link to this message
-
Re: Map side join
Thanks for the help.
What I did earlier is that I changed the configuration in HDFS and created
the table. I expected that the block size of the new Table to be of 32 MB.
But I found that while using Cloudera Manager you need to deploy Change in
Configuration of both the HDFS and Mapreduce. (I did it only for HDFS)
Now I deleted the old table and recreated the same. Now I could launch more
mappers.
Thanks a lot once again. Will post you what happens with more mappers.

Thanks and regards,
Souvik.

On Thu, Dec 13, 2012 at 12:06 PM, <[EMAIL PROTECTED]> wrote:

> **
> Hi Souvik
>
> To have the new hdfs block size in effect on the already existing files,
> you need to re copy them into hdfs.
>
> To play with the number of mappers you can set lesser value like 64mb for
> min and max split size.
>
> Mapred.min.split.size and mapred.max.split.size
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
> *Date: *Thu, 13 Dec 2012 12:00:16 -0600
> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> *Subject: *Re: Map side join
>
> Hi Bejoy,
>
> The input files are non-compressed text file.
> There are enough free slots in the cluster.
>
> Can you please let me know can I increase the no of mappers?
> I tried reducing the HDFS block size to 32 MB from 128 MB. I was expecting
> to get more mappers. But still it's launching same no of mappers like it
> was doing while the HDFS block size was 128 MB. I have enough map slots
> available, but not being able to utilize those.
>
>
> Thanks and regards,
> Souvik.
>
>
> On Thu, Dec 13, 2012 at 11:12 AM, <[EMAIL PROTECTED]> wrote:
>
>> **
>> Hi Souvik
>>
>> Is your input files compressed using some non splittable compression
>> codec?
>>
>> Do you have enough free slots while this job is running?
>>
>> Make sure that the job is not running locally.
>>
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
>> *Date: *Wed, 12 Dec 2012 14:27:27 -0600
>> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
>> *ReplyTo: * [EMAIL PROTECTED]
>> *Subject: *Re: Map side join
>>
>> Hi Bejoy,
>>
>> Yes I ran the pi example. It was fine.
>> Regarding the HIVE Job what I found is that it took 4 hrs for the first
>> map job to get completed.
>> Those map tasks were doing their job and only reported status after
>> completion. It is indeed taking too long time to finish. Nothing I could
>> find relevant in the logs.
>>
>> Thanks and regards,
>> Souvik.
>>
>> On Wed, Dec 12, 2012 at 8:04 AM, <[EMAIL PROTECTED]> wrote:
>>
>>> **
>>> Hi Souvik
>>>
>>> Apart from hive jobs is the normal mapreduce jobs like the wordcount
>>> running fine on your cluster?
>>>
>>> If it is working, for the hive jobs are you seeing anything skeptical in
>>> task, Tasktracker or jobtracker logs?
>>>
>>>
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from remote device, Please excuse typos
>>> ------------------------------
>>> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
>>> *Date: *Tue, 11 Dec 2012 17:12:20 -0600
>>> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
>>> *ReplyTo: * [EMAIL PROTECTED]
>>> *Subject: *Re: Map side join
>>>
>>> Hello Everybody,
>>>
>>> Need help in for on HIVE join. As we were talking about the Map side
>>> join I tried that.
>>> I set the flag set hive.auto.convert.join=true;
>>>
>>> I saw Hive converts the same to map join while launching the job. But
>>> the problem is that none of the map job progresses in my case. I made the
>>> dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
>>> done very quickly.
>>> No luck with any change of settings.
>>> Failing to progress with the default setting changes these settings.
>>> set hive.mapred.local.mem=1024; // Initially it was 216 I guess
>>> set hive.join.cache.size=100000; // Initialliu it was 25000
>>>
>>> Also on Hadoop side I made this changes
+
Souvik Banerjee 2012-12-27, 23:05