Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Poor HBase map-reduce scan performance


Copy link to this message
-
Re: Poor HBase map-reduce scan performance
ramkrishna vasudevan 2013-05-01, 07:29
Sorry.  I think someone hijacked this thread and I replied to this.
Naidu,
Request you to post a new thread if you have queries and do not hijack the
thread.

Regards
Ram
On Wed, May 1, 2013 at 12:57 PM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> This happens when your java process is running in debug mode and
> suspend='Y' option is selected.
>
> Regards
> Ram
>
>
> On Wed, May 1, 2013 at 12:55 PM, Naidu MS <[EMAIL PROTECTED]
> > wrote:
>
>> Hi i have two questions regarding hdfs and jps utility
>>
>> I am new to Hadoop and started leraning hadoop from the past week
>>
>> 1.when ever i start start-all.sh and jps in console it showing the
>> processes started
>>
>> *naidu@naidu:~/work/hadoop-1.0.4/bin$ jps*
>> *22283 NameNode*
>> *23516 TaskTracker*
>> *26711 Jps*
>> *22541 DataNode*
>> *23255 JobTracker*
>> *22813 SecondaryNameNode*
>> *Could not synchronize with target*
>>
>> But along with the list of process stared it always showing *" Could not
>> synchronize with target" *in the jps output. What is meant by "Could not
>> synchronize with target"?  Can some one explain why this is happening?
>>
>>
>> 2.Is it possible to format namenode multiple  times? When i enter the
>>  namenode -format command, it not formatting the name node and showing the
>> following ouput.
>>
>> *naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format*
>> *Warning: $HADOOP_HOME is deprecated.*
>> *
>> *
>> *13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: *
>> */*************************************************************
>> *STARTUP_MSG: Starting NameNode*
>> *STARTUP_MSG:   host = naidu/127.0.0.1*
>> *STARTUP_MSG:   args = [-format]*
>> *STARTUP_MSG:   version = 1.0.4*
>> *STARTUP_MSG:   build >> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
>> 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012*
>> *************************************************************/*
>> *Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y*
>> *Format aborted in /home/naidu/dfs/namenode*
>> *13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: *
>> */*************************************************************
>> *SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1*
>> *
>> *
>> *************************************************************/*
>>
>> Can someone help me in understanding this? Why is it not possible to
>> format
>> name node multiple times?
>>
>>
>> On Wed, May 1, 2013 at 12:22 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>>
>> > Not that it's a long-term solution, but try major-compacting before
>> running
>> > the benchmark.  If the LSM tree is CPU bound in merging HFiles/KeyValues
>> > through the PriorityQueue, then reducing to a single file per region
>> should
>> > help.  The merging of HFiles during a scan is not heavily optimized yet.
>> >
>> >
>> > On Tue, Apr 30, 2013 at 11:21 PM, lars hofhansl <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > If you can, try 0.94.4+; it should significantly reduce the amount of
>> > > bytes copied around in RAM during scanning, especially if you have
>> wide
>> > > rows and/or large key portions. That in turns makes scans scale better
>> > > across cores, since RAM is shared resource between cores (much like
>> > disk).
>> > >
>> > >
>> > > It's not hard to build the latest HBase against Cloudera's version of
>> > > Hadoop. I can send along a simple patch to pom.xml to do that.
>> > >
>> > > -- Lars
>> > >
>> > >
>> > >
>> > > ________________________________
>> > >  From: Bryan Keller <[EMAIL PROTECTED]>
>> > > To: [EMAIL PROTECTED]
>> > > Sent: Tuesday, April 30, 2013 11:02 PM
>> > > Subject: Re: Poor HBase map-reduce scan performance
>> > >
>> > >
>> > > The table has hashed keys so rows are evenly distributed amongst the
>> > > regionservers, and load on each regionserver is pretty much the same.
>> I
>> > > also have per-table balancing turned on. I get mostly data local
>> mappers
>> > > with only a few rack local (maybe 10 of the 250 mappers).