Doing HBase level checksums (as opposed to HDFS level) will mostly yield results for random gets.
Scans (like rowcounting and similar) will probably see a negligible improvement.
In HDFS a block and its checksum are stored in different local files on each datanode. So loading a block requires 2 IOs.
With the checksum handled by HBase only one IO is needed per block.
From: Robert Dyer <[EMAIL PROTECTED]>
To: Hbase-User <[EMAIL PROTECTED]>
Sent: Friday, February 1, 2013 11:37 AM
Subject: Re: HBase Checksum
Yes that log is a debug level log, as I saw in the source. But I too
enabled DEBUG and still never saw that log message.
But I, unlike you, see absolutely no change in performance.
One test I did however that makes me think it is actually enabled: if I
submit from another user I start getting security warnings about that user
not having permission for shortcircuit. So perhaps it is working, but I
have no clue why that log fails to show anywhere.
Regarding enabling checksums that is an interesting question. Do I have to
do a major compaction after enabling so HBase writes the checksum? Or will
it detect the setting change and do that automatically? What if I disable,
will it remove the checksums?
On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
> Hi Robert,
> That's perfectly fine, it was my next question ;)
> Anoop, I saw a 5% performance increase by activating HBase Checksum.
> Can I disable it again to retry the baseline and see the difference?
> Or now that it's there, it's to late?
> Also, regarding BlockReaderLocal, I don't find that in my logs, but
> after I have activated the shortcircuit, I saw a 41% performance
> increase, so I'm almost sure it's working, but I don't know either how
> to check that.
> What's the best way to see that on the logs? It's not display when
> HBase is starting. Even not displayed when I'n doing major
> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
> still can't see anything. Not in the region server, and not in the
> Also, to check with HDFS level logs whether the checksum meta file is
> getting read to the DFS client, I'm not really sure how to acheive
> 2013/2/1, Robert Dyer <[EMAIL PROTECTED]>:
> > Ok grepping the RS logs I see nothing with 'local' in any of them.
> > for that hint.
> > For the test I was using, I know it is data local. Every map task
> > data local, and no regions were moving recently.
> > I think I've hijacked this thread enough, I'll move my issues to another.
> > ;-)
> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <[EMAIL PROTECTED]>
> > wrote:
> >> Hi Robert
> >> When HDFS is doing the local short circuit read, it will use
> >> BlockReaderLocal class for reading. There should be some logs at the
> >> client side (RS) which tells abt creating new BlockReaderLocal . If you
> >> can see this then sure the local read is happening.
> >> Also check DN log. If local read happening, then you will not see read
> >> request related logs for the HFile at the DN side.
> >> You check your no# of HFiles and names for checking the logs
> >> Are you sure that when you tested, u have data locality? Region
> >> across RSs can break the full data locality.
> >> -Anoop-
> >> ________________________________________
> >> From: Robert Dyer [[EMAIL PROTECTED]]
> >> Sent: Friday, February 01, 2013 11:10 AM
> >> To: Hbase-User
> >> Subject: Re: HBase Checksum
> >> Not trying to hijack your thread here...
> >> But can you verify via logs that the shortcircuit is working? Because I
> >> enabled shortcircuit but I sure didn't see any performance increase.
> >> I haven't tried enabling hbase checksum yet but I'd like to be able to
> >> verify that works too.
> >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <[EMAIL PROTECTED]>