Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HDFS Restart with Replication


Copy link to this message
-
Re: HDFS Restart with Replication
Asaf Mesika 2013-08-07, 05:36
Yep. That's a confusing one.
When running /hbase stop master, it sets the shutdown flag in ZK. RS listen
in on this flag, and once they see it set, they shut them selfs down. Once
they are all down, the master goes down as well.

On Saturday, August 3, 2013, Jean-Daniel Cryans wrote:

> Ah then doing "bin/hbase-daemon.sh stop master" on the master node is
> the equivalent, but don't stop the region server themselves as the
> master will take care of it. Doing a stop on the master and the region
> servers will screw things up.
>
> J-D
>
> On Fri, Aug 2, 2013 at 3:28 PM, Patrick Schless
> <[EMAIL PROTECTED]> wrote:
> > Doesn't stop-hbase.sh (and its ilk) require the server to be able to
> manage
> > the clients (using unpassworded SSH keys, for instance)? I don't have
> that
> > set up (for security reasons). I use capistrano for all these sort of
> > coordination tasks.
> >
> >
> > On Fri, Aug 2, 2013 at 12:07 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]
> >wrote:
> >
> >> Doing a bin/stop-hbase.sh is the way to go, then on the Hadoop side
> >> you do stop-all.sh. I think your ordering is correct but I'm not sure
> >> you are using the right commands.
> >>
> >> J-D
> >>
> >> On Fri, Aug 2, 2013 at 8:27 AM, Patrick Schless
> >> <[EMAIL PROTECTED]> wrote:
> >> > Ah, I bet the issue is that I'm stopped the HMaster, but not the
> Region
> >> > Servers, then restarting HDFS. What's the correct order of operations
> for
> >> > bouncing everything?
> >> >
> >> >
> >> > On Thu, Aug 1, 2013 at 5:21 PM, Jean-Daniel Cryans <
> [EMAIL PROTECTED]
> >> >wrote:
> >> >
> >> >> Can you follow the life of one of those blocks though the Namenode
> and
> >> >> datanode logs? I'd suggest you start by doing a fsck on one of those
> >> >> files with the option that gives the block locations first.
> >> >>
> >> >> By the way why do you have split logs? Are region servers dying every
> >> >> time you try out something?
> >> >>
> >> >> On Thu, Aug 1, 2013 at 3:16 PM, Patrick Schless
> >> >> <[EMAIL PROTECTED]> wrote:
> >> >> > Yup, 14 datanodes, all check back in. However, all of the corrupt
> >> files
> >> >> > seem to be splitlogs from data05. This is true even though I've
> done
> >> >> > several restarts (each restart adding a few missing blocks).
> There's
> >> >> > nothing special about data05, and it seems to be in the cluster,
> the
> >> same
> >> >> > as anyone else.
> >> >> >
> >> >> >
> >> >> > On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans <
> >> [EMAIL PROTECTED]
> >> >> >wrote:
> >> >> >
> >> >> >> I can't think of a way how your missing blocks would be related to
> >> >> >> HBase replication, there's something else going on. Are all the
> >> >> >> datanodes checking back in?
> >> >> >>
> >> >> >> J-D
> >> >> >>
> >> >> >> On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless
> >> >> >> <[EMAIL PROTECTED]> wrote:
> >> >> >> > I'm running:
> >> >> >> > CDH4.1.2
> >> >> >> > HBase 0.92.1
> >> >> >> > Hadoop 2.0.0
> >> >> >> >
> >> >> >> > Is there an issue with restarting a standby cluster with
> >> replication
> >> >> >> > running? I am doing the following on the standby cluster:
> >> >> >> >
> >> >> >> > - stop hmaster
> >> >> >> > - stop name_node
> >> >> >> > - start name_node
> >> >> >> > - start hmaster
> >> >> >> >
> >> >> >> > When the name node comes back up, it's reliably missing blocks.
> I
> >> >> started
> >> >> >> > with 0 missing blocks, and have run through this scenario a few
> >> times,
> >> >> >> and
> >> >> >> > am up to 46 missing blocks, all from the table that is the
> standby
> >> for
> >> >> >> our
> >> >> >> > production table (in a different datacenter). The missing blocks
> >> all
> >> >> are
> >> >> >> > from the same table, and look like:
> >> >> >>