Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task


Copy link to this message
-
Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task
Jens, can i set a smaller value in my application?
Is this valid?
conf.setInt("mapred.max.split.size", 50);

This is our mapred-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>ip-10-10-100170.eu-east-1.compute.internal:8021</value>
  </property>
  <property>
    <name>mapred.job.tracker.http.address</name>
    <value>0.0.0.0:50030</value>
  </property>
  <property>
    <name>mapreduce.job.counters.max</name>
    <value>120</value>
  </property>
  <property>
    <name>mapred.output.compress</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.output.compression.type</name>
    <value>BLOCK</value>
  </property>
  <property>
    <name>mapred.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  </property>
  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>
  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>
  <property>
    <name>zlib.compress.level</name>
    <value>DEFAULT_COMPRESSION</value>
  </property>
  <property>
    <name>io.sort.factor</name>
    <value>64</value>
  </property>
  <property>
    <name>io.sort.record.percent</name>
    <value>0.05</value>
  </property>
  <property>
    <name>io.sort.spill.percent</name>
    <value>0.8</value>
  </property>
  <property>
    <name>mapred.reduce.parallel.copies</name>
    <value>10</value>
  </property>
  <property>
    <name>mapred.submit.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>6</value>
  </property>
  <property>
    <name>mapred.userlog.retain.hours</name>
    <value>24</value>
  </property>
  <property>
    <name>io.sort.mb</name>
    <value>112</value>
  </property>
  <property>
    <name>mapred.child.java.opts</name>
    <value> -Xmx471075479</value>
  </property>
  <property>
    <name>mapred.job.reuse.jvm.num.tasks</name>
    <value>1</value>
  </property>
  <property>
    <name>mapred.map.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.slowstart.completed.maps</name>
    <value>0.8</value>
  </property></configuration>
Suggest ways to overwrite the default value please.
On Mon, Aug 26, 2013 at 9:38 AM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Pavan,
>
> Standalone cluster? How many RS you are running?What are you trying to
> achieve in MR? Have you tried increasing scanner caching?
> Slow is very theoretical unless we know some more details of your stuff.
>
> ~Anil
>
>
>
> On Sun, Aug 25, 2013 at 5:52 PM, 李洪忠 <[EMAIL PROTECTED]> wrote:
>
>> You need release your map code here to analyze the question. generally,
>> when map/reduce hbase,  scanner with filter(s) is used. so the mapper count
>> is the hbase region count in your hbase table.
>> As the reason why you reduce so slow, I guess, you have an disaster join
>> on the three tables, which cause too many rows.
>>
>> 于 2013/8/26 4:36, Pavan Sudheendra 写道:
>>
>>  Another Question, why does it indicate number of mappers as 1? Can i
>>> change it so that multiple mappers perform computation?
>>>
>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

--
Regards-
Pavan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB