Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Map side join


Copy link to this message
-
Re: Map side join
Thanks for the help.
What I did earlier is that I changed the configuration in HDFS and created
the table. I expected that the block size of the new Table to be of 32 MB.
But I found that while using Cloudera Manager you need to deploy Change in
Configuration of both the HDFS and Mapreduce. (I did it only for HDFS)
Now I deleted the old table and recreated the same. Now I could launch more
mappers.
Thanks a lot once again. Will post you what happens with more mappers.

Thanks and regards,
Souvik.

On Thu, Dec 13, 2012 at 12:06 PM, <[EMAIL PROTECTED]> wrote:

> **
> Hi Souvik
>
> To have the new hdfs block size in effect on the already existing files,
> you need to re copy them into hdfs.
>
> To play with the number of mappers you can set lesser value like 64mb for
> min and max split size.
>
> Mapred.min.split.size and mapred.max.split.size
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
> *Date: *Thu, 13 Dec 2012 12:00:16 -0600
> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> *Subject: *Re: Map side join
>
> Hi Bejoy,
>
> The input files are non-compressed text file.
> There are enough free slots in the cluster.
>
> Can you please let me know can I increase the no of mappers?
> I tried reducing the HDFS block size to 32 MB from 128 MB. I was expecting
> to get more mappers. But still it's launching same no of mappers like it
> was doing while the HDFS block size was 128 MB. I have enough map slots
> available, but not being able to utilize those.
>
>
> Thanks and regards,
> Souvik.
>
>
> On Thu, Dec 13, 2012 at 11:12 AM, <[EMAIL PROTECTED]> wrote:
>
>> **
>> Hi Souvik
>>
>> Is your input files compressed using some non splittable compression
>> codec?
>>
>> Do you have enough free slots while this job is running?
>>
>> Make sure that the job is not running locally.
>>
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
>> *Date: *Wed, 12 Dec 2012 14:27:27 -0600
>> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
>> *ReplyTo: * [EMAIL PROTECTED]
>> *Subject: *Re: Map side join
>>
>> Hi Bejoy,
>>
>> Yes I ran the pi example. It was fine.
>> Regarding the HIVE Job what I found is that it took 4 hrs for the first
>> map job to get completed.
>> Those map tasks were doing their job and only reported status after
>> completion. It is indeed taking too long time to finish. Nothing I could
>> find relevant in the logs.
>>
>> Thanks and regards,
>> Souvik.
>>
>> On Wed, Dec 12, 2012 at 8:04 AM, <[EMAIL PROTECTED]> wrote:
>>
>>> **
>>> Hi Souvik
>>>
>>> Apart from hive jobs is the normal mapreduce jobs like the wordcount
>>> running fine on your cluster?
>>>
>>> If it is working, for the hive jobs are you seeing anything skeptical in
>>> task, Tasktracker or jobtracker logs?
>>>
>>>
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from remote device, Please excuse typos
>>> ------------------------------
>>> *From: * Souvik Banerjee <[EMAIL PROTECTED]>
>>> *Date: *Tue, 11 Dec 2012 17:12:20 -0600
>>> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
>>> *ReplyTo: * [EMAIL PROTECTED]
>>> *Subject: *Re: Map side join
>>>
>>> Hello Everybody,
>>>
>>> Need help in for on HIVE join. As we were talking about the Map side
>>> join I tried that.
>>> I set the flag set hive.auto.convert.join=true;
>>>
>>> I saw Hive converts the same to map join while launching the job. But
>>> the problem is that none of the map job progresses in my case. I made the
>>> dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
>>> done very quickly.
>>> No luck with any change of settings.
>>> Failing to progress with the default setting changes these settings.
>>> set hive.mapred.local.mem=1024; // Initially it was 216 I guess
>>> set hive.join.cache.size=100000; // Initialliu it was 25000
>>>
>>> Also on Hadoop side I made this changes
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB