Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task


+
Pavan Sudheendra 2013-08-26, 06:33
Copy link to this message
-
Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task
Ted and lhztop, here is a gist of my code: http://pastebin.com/mxY4AqBA

Can you suggest few ways of optimizing it? I know i am re-initializing the
conf object in the map function everytime its called, i'll change that.

Anil Gupta, 6 Node Cluster so 6 Region Servers.. I am basically trying to
do a partial join across 3 tables, perform some computation on it and dump
into another table..

The first Table is somehwere around 19m rows, 2nd one 1m rows and 3rd table
is 2.5m rows.. I know we can use hive/pig for this but i am to write this
as a map/reduce application.. For the first table, i created a smaller
subset of 100,000 rows and ran it. The output was my first thread message
which completed in one hour.. For 19m rows, i cannot imagine it running in
a finite time.. Please suggest something..
On Mon, Aug 26, 2013 at 12:03 PM, Pavan Sudheendra <[EMAIL PROTECTED]>wrote:

> Jens, can i set a smaller value in my application?
> Is this valid?
> conf.setInt("mapred.max.split.size", 50);
>
> This is our mapred-site.xml:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <configuration>
>   <property>
>     <name>mapred.job.tracker</name>
>     <value>ip-10-10-100170.eu-east-1.compute.internal:8021</value>
>   </property>
>   <property>
>     <name>mapred.job.tracker.http.address</name>
>     <value>0.0.0.0:50030</value>
>   </property>
>   <property>
>     <name>mapreduce.job.counters.max</name>
>     <value>120</value>
>   </property>
>   <property>
>     <name>mapred.output.compress</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>mapred.output.compression.type</name>
>     <value>BLOCK</value>
>   </property>
>   <property>
>     <name>mapred.output.compression.codec</name>
>     <value>org.apache.hadoop.io.compress.DefaultCodec</value>
>   </property>
>   <property>
>     <name>mapred.map.output.compression.codec</name>
>     <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>   </property>
>   <property>
>     <name>mapred.compress.map.output</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>zlib.compress.level</name>
>     <value>DEFAULT_COMPRESSION</value>
>   </property>
>   <property>
>     <name>io.sort.factor</name>
>     <value>64</value>
>   </property>
>   <property>
>     <name>io.sort.record.percent</name>
>     <value>0.05</value>
>   </property>
>   <property>
>     <name>io.sort.spill.percent</name>
>     <value>0.8</value>
>   </property>
>   <property>
>     <name>mapred.reduce.parallel.copies</name>
>     <value>10</value>
>   </property>
>   <property>
>     <name>mapred.submit.replication</name>
>     <value>2</value>
>   </property>
>   <property>
>     <name>mapred.reduce.tasks</name>
>     <value>6</value>
>   </property>
>   <property>
>     <name>mapred.userlog.retain.hours</name>
>     <value>24</value>
>   </property>
>   <property>
>     <name>io.sort.mb</name>
>     <value>112</value>
>   </property>
>   <property>
>     <name>mapred.child.java.opts</name>
>     <value> -Xmx471075479</value>
>   </property>
>   <property>
>     <name>mapred.job.reuse.jvm.num.tasks</name>
>     <value>1</value>
>   </property>
>   <property>
>     <name>mapred.map.tasks.speculative.execution</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>mapred.reduce.tasks.speculative.execution</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>mapred.reduce.slowstart.completed.maps</name>
>     <value>0.8</value>
>   </property></configuration>
>
>
> Suggest ways to overwrite the default value please.
>
>
> On Mon, Aug 26, 2013 at 9:38 AM, anil gupta <[EMAIL PROTECTED]> wrote:
>
>> Hi Pavan,
>>
>> Standalone cluster? How many RS you are running?What are you trying to
>> achieve in MR? Have you tried increasing scanner caching?
>> Slow is very theoretical unless we know some more details of your stuff.
>>
>> ~Anil
>>
>>
>>
>> On Sun, Aug 25, 2013 at 5:52 PM, 李洪忠 <[EMAIL PROTECTED]> wrote:
>>
>>> You need release your map code here to analyze the question. generally,

Regards-
Pavan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB