Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task


Copy link to this message
-
Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task
Jens, can i set a smaller value in my application?
Is this valid?
conf.setInt("mapred.max.split.size", 50);

This is our mapred-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>ip-10-10-100170.eu-east-1.compute.internal:8021</value>
  </property>
  <property>
    <name>mapred.job.tracker.http.address</name>
    <value>0.0.0.0:50030</value>
  </property>
  <property>
    <name>mapreduce.job.counters.max</name>
    <value>120</value>
  </property>
  <property>
    <name>mapred.output.compress</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.output.compression.type</name>
    <value>BLOCK</value>
  </property>
  <property>
    <name>mapred.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  </property>
  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>
  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>
  <property>
    <name>zlib.compress.level</name>
    <value>DEFAULT_COMPRESSION</value>
  </property>
  <property>
    <name>io.sort.factor</name>
    <value>64</value>
  </property>
  <property>
    <name>io.sort.record.percent</name>
    <value>0.05</value>
  </property>
  <property>
    <name>io.sort.spill.percent</name>
    <value>0.8</value>
  </property>
  <property>
    <name>mapred.reduce.parallel.copies</name>
    <value>10</value>
  </property>
  <property>
    <name>mapred.submit.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>6</value>
  </property>
  <property>
    <name>mapred.userlog.retain.hours</name>
    <value>24</value>
  </property>
  <property>
    <name>io.sort.mb</name>
    <value>112</value>
  </property>
  <property>
    <name>mapred.child.java.opts</name>
    <value> -Xmx471075479</value>
  </property>
  <property>
    <name>mapred.job.reuse.jvm.num.tasks</name>
    <value>1</value>
  </property>
  <property>
    <name>mapred.map.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.slowstart.completed.maps</name>
    <value>0.8</value>
  </property></configuration>
Suggest ways to overwrite the default value please.
On Mon, Aug 26, 2013 at 9:38 AM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Pavan,
>
> Standalone cluster? How many RS you are running?What are you trying to
> achieve in MR? Have you tried increasing scanner caching?
> Slow is very theoretical unless we know some more details of your stuff.
>
> ~Anil
>
>
>
> On Sun, Aug 25, 2013 at 5:52 PM, 李洪忠 <[EMAIL PROTECTED]> wrote:
>
>> You need release your map code here to analyze the question. generally,
>> when map/reduce hbase,  scanner with filter(s) is used. so the mapper count
>> is the hbase region count in your hbase table.
>> As the reason why you reduce so slow, I guess, you have an disaster join
>> on the three tables, which cause too many rows.
>>
>> 于 2013/8/26 4:36, Pavan Sudheendra 写道:
>>
>>  Another Question, why does it indicate number of mappers as 1? Can i
>>> change it so that multiple mappers perform computation?
>>>
>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

--
Regards-
Pavan