|
|
-
HBase CheckSum vs Hadoop CheckSum
Jean-Marc Spaggiari 2013-02-26, 12:03
Hi,
Quick question.
When we are activating the short circuit read in HBase, it's recommanded to activate the HBase checksum instead of Hadoop ones. This is done in the HBase configuration.
I'm wondering what is the impact on the DataNode Block Scanner.
Is it going to be stopped because checksums can't be used anymore? Or will Hadoop continue to store its own checksum and use them but it's just that HBase will not look at them anymore and will store and use its own checksums?
Since it's an HBase configuration (hbase.regionserver.checksum.verify) I'm expecting this to not have any impact on the Block Scanner, but I'm looking for a confirmation.
Thanks,
JM
+
Jean-Marc Spaggiari 2013-02-26, 12:03
-
答复: HBase CheckSum vs Hadoop CheckSum
谢良 2013-02-26, 12:24
comments in line
Regards, Liang ________________________________________ 发件人: Jean-Marc Spaggiari [[EMAIL PROTECTED]] 发送时间: 2013年2月26日 20:03 收件人: user 主题: HBase CheckSum vs Hadoop CheckSum
Hi,
Quick question.
When we are activating the short circuit read in HBase, it's recommanded to activate the HBase checksum instead of Hadoop ones. This is done in the HBase configuration.
I'm wondering what is the impact on the DataNode Block Scanner.
Is it going to be stopped because checksums can't be used anymore? Or will Hadoop continue to store its own checksum and use them but it's just that HBase will not look at them anymore and will store and use its own checksums? [liang xie]: yes, still store checksum in meta file in current community version. btw, facebook's hadoop-fb20 branch has an inline checksum feature,IIRC
Since it's an HBase configuration (hbase.regionserver.checksum.verify) I'm expecting this to not have any impact on the Block Scanner, but I'm looking for a confirmation. [liang xie]: yes, no impact on hdfs's DataBlockScanner, you can check detail in datanode's BlockPoolSliceScanner.verifyBlock(): blockSender = new BlockSender(block, 0, -1, false, true, true, datanode, null); Thanks,
JM
-
RE: HBase CheckSum vs Hadoop CheckSum
Anoop Sam John 2013-02-26, 12:45
I was typing a reply and by the time Liang replied :) Ya agree with him. It is only the HDFS client (At RS) not doing the checksum verification based on the HDFS stored checksum. Instead HBase only check for the correctness by comparing with stored checksum values. Still the periodic operation of block scanning at HDFS will continue. We can turn this OFF by configuring this period with a -ve value I think.
-Anoop- ________________________________________ From: 谢良 [[EMAIL PROTECTED]] Sent: Tuesday, February 26, 2013 5:54 PM To: [EMAIL PROTECTED] Subject: 答复: HBase CheckSum vs Hadoop CheckSum
comments in line
Regards, Liang ________________________________________ 发件人: Jean-Marc Spaggiari [[EMAIL PROTECTED]] 发送时间: 2013年2月26日 20:03 收件人: user 主题: HBase CheckSum vs Hadoop CheckSum
Hi,
Quick question.
When we are activating the short circuit read in HBase, it's recommanded to activate the HBase checksum instead of Hadoop ones. This is done in the HBase configuration.
I'm wondering what is the impact on the DataNode Block Scanner.
Is it going to be stopped because checksums can't be used anymore? Or will Hadoop continue to store its own checksum and use them but it's just that HBase will not look at them anymore and will store and use its own checksums? [liang xie]: yes, still store checksum in meta file in current community version. btw, facebook's hadoop-fb20 branch has an inline checksum feature,IIRC
Since it's an HBase configuration (hbase.regionserver.checksum.verify) I'm expecting this to not have any impact on the Block Scanner, but I'm looking for a confirmation. [liang xie]: yes, no impact on hdfs's DataBlockScanner, you can check detail in datanode's BlockPoolSliceScanner.verifyBlock(): blockSender = new BlockSender(block, 0, -1, false, true, true, datanode, null); Thanks,
JM
+
Anoop Sam John 2013-02-26, 12:45
-
Re: HBase CheckSum vs Hadoop CheckSum
Jean-Marc Spaggiari 2013-02-26, 13:34
Thanks for your replies. Few seconds I was feeling unsecured ;)
Seems the default period for the DataBlockScanner is 3 weeks: static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L;
And I have not found anyway to modify that. I will continue to search and might drop a msg on hadoop list if I still don't find.
Thanks,
JM
2013/2/26 Anoop Sam John <[EMAIL PROTECTED]>: > I was typing a reply and by the time Liang replied :) > Ya agree with him. It is only the HDFS client (At RS) not doing the checksum verification based on the HDFS stored checksum. > Instead HBase only check for the correctness by comparing with stored checksum values. Still the periodic operation of block scanning at HDFS will continue. We can turn this OFF by configuring this period with a -ve value I think. > > -Anoop- > ________________________________________ > From: 谢良 [[EMAIL PROTECTED]] > Sent: Tuesday, February 26, 2013 5:54 PM > To: [EMAIL PROTECTED] > Subject: 答复: HBase CheckSum vs Hadoop CheckSum > > comments in line > > Regards, > Liang > ________________________________________ > 发件人: Jean-Marc Spaggiari [[EMAIL PROTECTED]] > 发送时间: 2013年2月26日 20:03 > 收件人: user > 主题: HBase CheckSum vs Hadoop CheckSum > > Hi, > > Quick question. > > When we are activating the short circuit read in HBase, it's > recommanded to activate the HBase checksum instead of Hadoop ones. > This is done in the HBase configuration. > > I'm wondering what is the impact on the DataNode Block Scanner. > > Is it going to be stopped because checksums can't be used anymore? Or > will Hadoop continue to store its own checksum and use them but it's > just that HBase will not look at them anymore and will store and use > its own checksums? > [liang xie]: yes, still store checksum in meta file in current community version. > btw, facebook's hadoop-fb20 branch has an inline checksum feature,IIRC > > Since it's an HBase configuration (hbase.regionserver.checksum.verify) > I'm expecting this to not have any impact on the Block Scanner, but > I'm looking for a confirmation. > [liang xie]: yes, no impact on hdfs's DataBlockScanner, you can check > detail in datanode's BlockPoolSliceScanner.verifyBlock(): > blockSender = new BlockSender(block, 0, -1, false, true, true, > datanode, null); > > > Thanks, > > JM
+
Jean-Marc Spaggiari 2013-02-26, 13:34
-
RE: HBase CheckSum vs Hadoop CheckSum
Anoop Sam John 2013-02-26, 13:53
JM Pls check "dfs.datanode.scan.period.hours"
-Anoop- ________________________________________ From: Jean-Marc Spaggiari [[EMAIL PROTECTED]] Sent: Tuesday, February 26, 2013 7:04 PM To: [EMAIL PROTECTED] Subject: Re: HBase CheckSum vs Hadoop CheckSum
Thanks for your replies. Few seconds I was feeling unsecured ;)
Seems the default period for the DataBlockScanner is 3 weeks: static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L;
And I have not found anyway to modify that. I will continue to search and might drop a msg on hadoop list if I still don't find.
Thanks,
JM
2013/2/26 Anoop Sam John <[EMAIL PROTECTED]>: > I was typing a reply and by the time Liang replied :) > Ya agree with him. It is only the HDFS client (At RS) not doing the checksum verification based on the HDFS stored checksum. > Instead HBase only check for the correctness by comparing with stored checksum values. Still the periodic operation of block scanning at HDFS will continue. We can turn this OFF by configuring this period with a -ve value I think. > > -Anoop- > ________________________________________ > From: 谢良 [[EMAIL PROTECTED]] > Sent: Tuesday, February 26, 2013 5:54 PM > To: [EMAIL PROTECTED] > Subject: 答复: HBase CheckSum vs Hadoop CheckSum > > comments in line > > Regards, > Liang > ________________________________________ > 发件人: Jean-Marc Spaggiari [[EMAIL PROTECTED]] > 发送时间: 2013年2月26日 20:03 > 收件人: user > 主题: HBase CheckSum vs Hadoop CheckSum > > Hi, > > Quick question. > > When we are activating the short circuit read in HBase, it's > recommanded to activate the HBase checksum instead of Hadoop ones. > This is done in the HBase configuration. > > I'm wondering what is the impact on the DataNode Block Scanner. > > Is it going to be stopped because checksums can't be used anymore? Or > will Hadoop continue to store its own checksum and use them but it's > just that HBase will not look at them anymore and will store and use > its own checksums? > [liang xie]: yes, still store checksum in meta file in current community version. > btw, facebook's hadoop-fb20 branch has an inline checksum feature,IIRC > > Since it's an HBase configuration (hbase.regionserver.checksum.verify) > I'm expecting this to not have any impact on the Block Scanner, but > I'm looking for a confirmation. > [liang xie]: yes, no impact on hdfs's DataBlockScanner, you can check > detail in datanode's BlockPoolSliceScanner.verifyBlock(): > blockSender = new BlockSender(block, 0, -1, false, true, true, > datanode, null); > > > Thanks, > > JM
+
Anoop Sam John 2013-02-26, 13:53
-
Re: HBase CheckSum vs Hadoop CheckSum
Jean-Marc Spaggiari 2013-02-27, 01:31
Oh, cool! Thanks Anoop!
2013/2/26 Anoop Sam John <[EMAIL PROTECTED]>: > JM > Pls check "dfs.datanode.scan.period.hours" > > -Anoop- > ________________________________________ > From: Jean-Marc Spaggiari [[EMAIL PROTECTED]] > Sent: Tuesday, February 26, 2013 7:04 PM > To: [EMAIL PROTECTED] > Subject: Re: HBase CheckSum vs Hadoop CheckSum > > Thanks for your replies. Few seconds I was feeling unsecured ;) > > Seems the default period for the DataBlockScanner is 3 weeks: > static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; > > And I have not found anyway to modify that. I will continue to search > and might drop a msg on hadoop list if I still don't find. > > Thanks, > > JM > > 2013/2/26 Anoop Sam John <[EMAIL PROTECTED]>: >> I was typing a reply and by the time Liang replied :) >> Ya agree with him. It is only the HDFS client (At RS) not doing the checksum verification based on the HDFS stored checksum. >> Instead HBase only check for the correctness by comparing with stored checksum values. Still the periodic operation of block scanning at HDFS will continue. We can turn this OFF by configuring this period with a -ve value I think. >> >> -Anoop- >> ________________________________________ >> From: 谢良 [[EMAIL PROTECTED]] >> Sent: Tuesday, February 26, 2013 5:54 PM >> To: [EMAIL PROTECTED] >> Subject: 答复: HBase CheckSum vs Hadoop CheckSum >> >> comments in line >> >> Regards, >> Liang >> ________________________________________ >> 发件人: Jean-Marc Spaggiari [[EMAIL PROTECTED]] >> 发送时间: 2013年2月26日 20:03 >> 收件人: user >> 主题: HBase CheckSum vs Hadoop CheckSum >> >> Hi, >> >> Quick question. >> >> When we are activating the short circuit read in HBase, it's >> recommanded to activate the HBase checksum instead of Hadoop ones. >> This is done in the HBase configuration. >> >> I'm wondering what is the impact on the DataNode Block Scanner. >> >> Is it going to be stopped because checksums can't be used anymore? Or >> will Hadoop continue to store its own checksum and use them but it's >> just that HBase will not look at them anymore and will store and use >> its own checksums? >> [liang xie]: yes, still store checksum in meta file in current community version. >> btw, facebook's hadoop-fb20 branch has an inline checksum feature,IIRC >> >> Since it's an HBase configuration (hbase.regionserver.checksum.verify) >> I'm expecting this to not have any impact on the Block Scanner, but >> I'm looking for a confirmation. >> [liang xie]: yes, no impact on hdfs's DataBlockScanner, you can check >> detail in datanode's BlockPoolSliceScanner.verifyBlock(): >> blockSender = new BlockSender(block, 0, -1, false, true, true, >> datanode, null); >> >> >> Thanks, >> >> JM
+
Jean-Marc Spaggiari 2013-02-27, 01:31
|
|