Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HDFS Restart with Replication


Copy link to this message
-
Re: HDFS Restart with Replication
Doing a bin/stop-hbase.sh is the way to go, then on the Hadoop side
you do stop-all.sh. I think your ordering is correct but I'm not sure
you are using the right commands.

J-D

On Fri, Aug 2, 2013 at 8:27 AM, Patrick Schless
<[EMAIL PROTECTED]> wrote:
> Ah, I bet the issue is that I'm stopped the HMaster, but not the Region
> Servers, then restarting HDFS. What's the correct order of operations for
> bouncing everything?
>
>
> On Thu, Aug 1, 2013 at 5:21 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>
>> Can you follow the life of one of those blocks though the Namenode and
>> datanode logs? I'd suggest you start by doing a fsck on one of those
>> files with the option that gives the block locations first.
>>
>> By the way why do you have split logs? Are region servers dying every
>> time you try out something?
>>
>> On Thu, Aug 1, 2013 at 3:16 PM, Patrick Schless
>> <[EMAIL PROTECTED]> wrote:
>> > Yup, 14 datanodes, all check back in. However, all of the corrupt files
>> > seem to be splitlogs from data05. This is true even though I've done
>> > several restarts (each restart adding a few missing blocks). There's
>> > nothing special about data05, and it seems to be in the cluster, the same
>> > as anyone else.
>> >
>> >
>> > On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> I can't think of a way how your missing blocks would be related to
>> >> HBase replication, there's something else going on. Are all the
>> >> datanodes checking back in?
>> >>
>> >> J-D
>> >>
>> >> On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > I'm running:
>> >> > CDH4.1.2
>> >> > HBase 0.92.1
>> >> > Hadoop 2.0.0
>> >> >
>> >> > Is there an issue with restarting a standby cluster with replication
>> >> > running? I am doing the following on the standby cluster:
>> >> >
>> >> > - stop hmaster
>> >> > - stop name_node
>> >> > - start name_node
>> >> > - start hmaster
>> >> >
>> >> > When the name node comes back up, it's reliably missing blocks. I
>> started
>> >> > with 0 missing blocks, and have run through this scenario a few times,
>> >> and
>> >> > am up to 46 missing blocks, all from the table that is the standby for
>> >> our
>> >> > production table (in a different datacenter). The missing blocks all
>> are
>> >> > from the same table, and look like:
>> >> >
>> >> > blk_-2036986832155369224 /hbase/splitlog/data01.sea01.staging.tdb.com
>> >> > ,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com
>> >> > %3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com
>> >> > %2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com
>> >> >
>> >>
>> %252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/0000000001366319450.temp
>> >> >
>> >> > Do I have to stop replication before restarting the standby?
>> >> >
>> >> > Thanks,
>> >> > Patrick
>> >>
>>