Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HDFS Restart with Replication


Copy link to this message
-
Re: HDFS Restart with Replication
Jean-Daniel Cryans 2013-08-01, 22:21
Can you follow the life of one of those blocks though the Namenode and
datanode logs? I'd suggest you start by doing a fsck on one of those
files with the option that gives the block locations first.

By the way why do you have split logs? Are region servers dying every
time you try out something?

On Thu, Aug 1, 2013 at 3:16 PM, Patrick Schless
<[EMAIL PROTECTED]> wrote:
> Yup, 14 datanodes, all check back in. However, all of the corrupt files
> seem to be splitlogs from data05. This is true even though I've done
> several restarts (each restart adding a few missing blocks). There's
> nothing special about data05, and it seems to be in the cluster, the same
> as anyone else.
>
>
> On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>
>> I can't think of a way how your missing blocks would be related to
>> HBase replication, there's something else going on. Are all the
>> datanodes checking back in?
>>
>> J-D
>>
>> On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless
>> <[EMAIL PROTECTED]> wrote:
>> > I'm running:
>> > CDH4.1.2
>> > HBase 0.92.1
>> > Hadoop 2.0.0
>> >
>> > Is there an issue with restarting a standby cluster with replication
>> > running? I am doing the following on the standby cluster:
>> >
>> > - stop hmaster
>> > - stop name_node
>> > - start name_node
>> > - start hmaster
>> >
>> > When the name node comes back up, it's reliably missing blocks. I started
>> > with 0 missing blocks, and have run through this scenario a few times,
>> and
>> > am up to 46 missing blocks, all from the table that is the standby for
>> our
>> > production table (in a different datacenter). The missing blocks all are
>> > from the same table, and look like:
>> >
>> > blk_-2036986832155369224 /hbase/splitlog/data01.sea01.staging.tdb.com
>> > ,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com
>> > %3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com
>> > %2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com
>> >
>> %252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/0000000001366319450.temp
>> >
>> > Do I have to stop replication before restarting the standby?
>> >
>> > Thanks,
>> > Patrick
>>