Aleksandr Shulman 2013-01-14, 18:32
Ted Yu 2013-01-14, 19:01
Aleksandr Shulman 2013-01-14, 19:15
Andrew Purtell 2013-01-14, 23:15
Jonathan Hsieh 2013-01-15, 01:27
Andrew Purtell 2013-01-15, 02:47
Jonathan Hsieh 2013-01-15, 08:55
I would be +1 on killing datanodes during the tests. I think we tend to
under analyze the impact on an HDFS error in HBase.
See for example HBASE-6738<https://issues.apache.org/jira/browse/HBASE-6738>:
in the distributed log, we were considering a task as dead if the split was
not done in 25s. If you were going to the dead DN to read the WAL, 25s was
far from enough, and we were ending up doing the same split on multiple
HDFS is a nice buddy, but it can't hide everything.
On Tue, Jan 15, 2013 at 9:55 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
> My counter-argument here is that this would be a bug in HDFS as
> opposed to HBase. It is good to know, but ideally shouldn't be exposed
> at the HBase level. This test won't really make sense if there was a
> different FS underneath.
> That said, if you insist we can add and and report on this (lower
> priority than the hbase-level problems though).
> On Mon, Jan 14, 2013 at 6:47 PM, Andrew Purtell <[EMAIL PROTECTED]>
> > If a datanode goes down and it has an indirect bad effect on snapshots,
> > this would be useful to know.
> > For the HA NN item, I threw that in there for completeness sake. Ideally
> > client like HBase wouldn't notice.
> > On Mon, Jan 14, 2013 at 5:27 PM, Jonathan Hsieh <[EMAIL PROTECTED]>
> >> I think the killing data nodes and killing HA NN is out of scope form
> >> an HBase point of view.
> > --
> > Best regards,
> > - Andy
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // [EMAIL PROTECTED]
Jean-Marc Spaggiari 2013-01-15, 19:01