Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Checksum

Copy link to this message
Re: HBase Checksum
Agreed. One should be able to monitor these things.
Mind filing a jira describing your experience?

 From: Jean-Marc Spaggiari <[EMAIL PROTECTED]>
Sent: Friday, February 1, 2013 1:09 PM
Subject: Re: HBase Checksum
Thanks for the clarification Lars.

Is there any UI or specify startup log we can check to validate that
it's activated? If not, will it be nice to have something like that?

2013/2/1, lars hofhansl <[EMAIL PROTECTED]>:
> Doing HBase level checksums (as opposed to HDFS level) will mostly yield
> results for random gets.
> Scans (like rowcounting and similar) will probably see a negligible
> improvement.
> In HDFS a block and its checksum are stored in different local files on each
> datanode. So loading a block requires 2 IOs.
> With the checksum handled by HBase only one IO is needed per block.
> ________________________________
>  From: Robert Dyer <[EMAIL PROTECTED]>
> To: Hbase-User <[EMAIL PROTECTED]>
> Sent: Friday, February 1, 2013 11:37 AM
> Subject: Re: HBase Checksum
> Yes that log is a debug level log, as I saw in the source.  But I too
> enabled DEBUG and still never saw that log message.
> But I, unlike you, see absolutely no change in performance.
> One test I did however that makes me think it is actually enabled: if I
> submit from another user I start getting security warnings about that user
> not having permission for shortcircuit.  So perhaps it is working, but I
> have no clue why that log fails to show anywhere.
> Regarding enabling checksums that is an interesting question.  Do I have to
> do a major compaction after enabling so HBase writes the checksum?  Or will
> it detect the setting change and do that automatically?  What if I disable,
> will it remove the checksums?
> On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
>> wrote:
>> Hi Robert,
>> That's perfectly fine, it was my next question ;)
>> Anoop, I saw a 5% performance increase by activating HBase Checksum.
>> Can I disable it again to retry the baseline and see the difference?
>> Or now that it's there, it's to late?
>> Also, regarding BlockReaderLocal, I don't find that in my logs, but
>> after I have activated the shortcircuit, I saw a 41% performance
>> increase, so I'm almost sure it's working, but I don't know either how
>>  to check that.
>> What's the best way to see that on the logs? It's not display when
>> HBase is starting. Even not displayed when I'n doing major
>> compactions.
>> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
>> still can't see anything. Not in the region server, and not in the
>> datanode.
>> Also, to check with HDFS level logs whether the checksum meta file is
>> getting read to the DFS client, I'm not really sure how to acheive
>> that.
>> JM
>> 2013/2/1, Robert Dyer <[EMAIL PROTECTED]>:
>> > Ok grepping the RS logs I see nothing with 'local' in any of them.
>>  Thanks
>> > for that hint.
>> >
>> > For the test I was using, I know it is data local.  Every map task
>> launched
>> > data local, and no regions were moving recently.
>> >
>> > I think I've hijacked this thread enough, I'll move my issues to
>> > another.
>> > ;-)
>> >
>> >
>> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <[EMAIL PROTECTED]>
>> > wrote:
>> >
>> >> Hi Robert
>> >>           When HDFS is doing the local short circuit read, it will use
>> >> BlockReaderLocal class for reading.  There should be some logs at the
>> DFS
>> >> client side (RS) which tells abt creating new BlockReaderLocal .  If
>> >> you
>> >> can see this then sure the local read is happening.
>> >>
>> >> Also check DN log.  If local read happening, then you will not see
>> >> read
>> >> request related logs for the HFile at the DN side.
>> >> You check your no# of HFiles and names for checking the logs
>> >>
>> >> Are you sure that when you tested, u have data locality? Region