Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task


Copy link to this message
-
Re: Mapper and Reducer takes longer than usual for a HBase table aggregation task
Pavan Sudheendra 2013-08-26, 06:49
Ted and lhztop, here is a gist of my code: http://pastebin.com/mxY4AqBA

Can you suggest few ways of optimizing it? I know i am re-initializing the
conf object in the map function everytime its called, i'll change that.

Anil Gupta, 6 Node Cluster so 6 Region Servers.. I am basically trying to
do a partial join across 3 tables, perform some computation on it and dump
into another table..

The first Table is somehwere around 19m rows, 2nd one 1m rows and 3rd table
is 2.5m rows.. I know we can use hive/pig for this but i am to write this
as a map/reduce application.. For the first table, i created a smaller
subset of 100,000 rows and ran it. The output was my first thread message
which completed in one hour.. For 19m rows, i cannot imagine it running in
a finite time.. Please suggest something..
On Mon, Aug 26, 2013 at 12:03 PM, Pavan Sudheendra <[EMAIL PROTECTED]>wrote:

> Jens, can i set a smaller value in my application?
> Is this valid?
> conf.setInt("mapred.max.split.size", 50);
>
> This is our mapred-site.xml:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <configuration>
>   <property>
>     <name>mapred.job.tracker</name>
>     <value>ip-10-10-100170.eu-east-1.compute.internal:8021</value>
>   </property>
>   <property>
>     <name>mapred.job.tracker.http.address</name>
>     <value>0.0.0.0:50030</value>
>   </property>
>   <property>
>     <name>mapreduce.job.counters.max</name>
>     <value>120</value>
>   </property>
>   <property>
>     <name>mapred.output.compress</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>mapred.output.compression.type</name>
>     <value>BLOCK</value>
>   </property>
>   <property>
>     <name>mapred.output.compression.codec</name>
>     <value>org.apache.hadoop.io.compress.DefaultCodec</value>
>   </property>
>   <property>
>     <name>mapred.map.output.compression.codec</name>
>     <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>   </property>
>   <property>
>     <name>mapred.compress.map.output</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>zlib.compress.level</name>
>     <value>DEFAULT_COMPRESSION</value>
>   </property>
>   <property>
>     <name>io.sort.factor</name>
>     <value>64</value>
>   </property>
>   <property>
>     <name>io.sort.record.percent</name>
>     <value>0.05</value>
>   </property>
>   <property>
>     <name>io.sort.spill.percent</name>
>     <value>0.8</value>
>   </property>
>   <property>
>     <name>mapred.reduce.parallel.copies</name>
>     <value>10</value>
>   </property>
>   <property>
>     <name>mapred.submit.replication</name>
>     <value>2</value>
>   </property>
>   <property>
>     <name>mapred.reduce.tasks</name>
>     <value>6</value>
>   </property>
>   <property>
>     <name>mapred.userlog.retain.hours</name>
>     <value>24</value>
>   </property>
>   <property>
>     <name>io.sort.mb</name>
>     <value>112</value>
>   </property>
>   <property>
>     <name>mapred.child.java.opts</name>
>     <value> -Xmx471075479</value>
>   </property>
>   <property>
>     <name>mapred.job.reuse.jvm.num.tasks</name>
>     <value>1</value>
>   </property>
>   <property>
>     <name>mapred.map.tasks.speculative.execution</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>mapred.reduce.tasks.speculative.execution</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>mapred.reduce.slowstart.completed.maps</name>
>     <value>0.8</value>
>   </property></configuration>
>
>
> Suggest ways to overwrite the default value please.
>
>
> On Mon, Aug 26, 2013 at 9:38 AM, anil gupta <[EMAIL PROTECTED]> wrote:
>
>> Hi Pavan,
>>
>> Standalone cluster? How many RS you are running?What are you trying to
>> achieve in MR? Have you tried increasing scanner caching?
>> Slow is very theoretical unless we know some more details of your stuff.
>>
>> ~Anil
>>
>>
>>
>> On Sun, Aug 25, 2013 at 5:52 PM, 李洪忠 <[EMAIL PROTECTED]> wrote:
>>
>>> You need release your map code here to analyze the question. generally,

Regards-
Pavan