Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HMaster and HRegionServer going down


Copy link to this message
-
Re: HMaster and HRegionServer going down
Azuryy Yu 2013-06-05, 12:38
gc log cannot get by default. need some configuration. do you have some
batch read or write to hbase?

--Send from my Sony mobile.
On Jun 5, 2013 8:25 PM, "Vimal Jain" <[EMAIL PROTECTED]> wrote:

> I dont have GC logs.Do you get it by default  or it has to be configured ?
> After i came to know about crash , i checked which all processes are
> running using "jps"
> It displayed 4 processes , "namenode","datanode","secondarynamenode" and
> "HQuorumpeer".
> So i stopped dfs by running $HADOOP_HOME/bin/stop-dfs.sh and then i stopped
> hbase by running $HBASE_HOME/bin/stop-hbase.sh
>
>
> On Wed, Jun 5, 2013 at 5:49 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>
> > do you have GC log? and what you did during crash? and whats your gc
> > options?
> >
> > for the dn error, thats net work issue generally, because dn received an
> > incomplete packet.
> >
> > --Send from my Sony mobile.
> > On Jun 5, 2013 8:10 PM, "Vimal Jain" <[EMAIL PROTECTED]> wrote:
> >
> > > Yes.
> > > Thats true.
> > > There are some errors in all 3 logs during same period , i.e. data ,
> > master
> > > and region.
> > > But i am unable to deduce the exact cause of error.
> > > Can you please help in detecting the problem ?
> > >
> > > So far i am suspecting following :-
> > > I have 1GB heap (default) allocated for all 3 processes , i.e.
> > > Master,Region,Zookeeper.
> > > Both  Master and Region took more time for GC ( as inferred from lines
> in
> > > logs like "slept more time than configured one" etc ) .
> > > Due to this there was  zookeeper connection time out for both Master
> and
> > > Region and hence both went down.
> > >
> > > I am newbie to Hbase and hence may be my findings are not correct.
> > > I want to be 100 % sure before increasing heap space for both Master
> and
> > > Region ( Both around 2GB) to solve this.
> > > At present i have restarted the cluster with default heap space only (
> > 1GB
> > > ).
> > >
> > >
> > >
> > > On Wed, Jun 5, 2013 at 5:23 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > there have errors in your dats node log, and the error time match
> with
> > rs
> > > > log error time.
> > > >
> > > > --Send from my Sony mobile.
> > > > On Jun 5, 2013 5:06 PM, "Vimal Jain" <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > I don't think so , as i dont find any issues in data node logs.
> > > > > Also there are lot of exceptions like "session expired" , "slept
> more
> > > > than
> > > > > configured time" . what are these ?
> > > > >
> > > > >
> > > > > On Wed, Jun 5, 2013 at 2:27 PM, Azuryy Yu <[EMAIL PROTECTED]>
> > wrote:
> > > > >
> > > > > > Because your data node 192.168.20.30 broke down. which leads to
> RS
> > > > down.
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 5, 2013 at 3:19 PM, Vimal Jain <[EMAIL PROTECTED]>
> > wrote:
> > > > > >
> > > > > > > Here is the complete log:
> > > > > > >
> > > > > > > http://bin.cakephp.org/saved/103001 - Hregion
> > > > > > > http://bin.cakephp.org/saved/103000 - Hmaster
> > > > > > > http://bin.cakephp.org/saved/103002 - Datanode
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 5, 2013 at 11:58 AM, Vimal Jain <[EMAIL PROTECTED]>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > I have set up Hbase in pseudo-distributed mode.
> > > > > > > > It was working fine for 6 days , but suddenly today morning
> > both
> > > > > > HMaster
> > > > > > > > and Hregion process went down.
> > > > > > > > I checked in logs of both hadoop and hbase.
> > > > > > > > Please help here.
> > > > > > > > Here are the snippets :-
> > > > > > > >
> > > > > > > > *Datanode logs:*
> > > > > > > > 2013-06-05 05:12:51,436 INFO
> > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
> > > > > > > receiveBlock
> > > > > > > > for block blk_1597245478875608321_2818 java.io.EOFException:
> > > while
> > > > > > trying
> > > > > > > > to read 2347 bytes
> > > > > > > > 2013-06-05 05:12:51,442 INFO
> > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> > > > > > > > blk_1597245478875608321_2818 received exception