Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Checksum


+
Jean-Marc Spaggiari 2013-01-31, 22:46
+
Anoop Sam John 2013-02-01, 03:55
+
Robert Dyer 2013-02-01, 05:40
+
Anoop Sam John 2013-02-01, 05:51
+
Robert Dyer 2013-02-01, 06:23
+
Jean-Marc Spaggiari 2013-02-01, 12:30
+
Robert Dyer 2013-02-01, 19:37
Copy link to this message
-
Re: HBase Checksum
Doing HBase level checksums (as opposed to HDFS level) will mostly yield results for random gets.
Scans (like rowcounting and similar) will probably see a negligible improvement.

In HDFS a block and its checksum are stored in different local files on each datanode. So loading a block requires 2 IOs.
With the checksum handled by HBase only one IO is needed per block.

________________________________
 From: Robert Dyer <[EMAIL PROTECTED]>
To: Hbase-User <[EMAIL PROTECTED]>
Sent: Friday, February 1, 2013 11:37 AM
Subject: Re: HBase Checksum
 
Yes that log is a debug level log, as I saw in the source.  But I too
enabled DEBUG and still never saw that log message.

But I, unlike you, see absolutely no change in performance.

One test I did however that makes me think it is actually enabled: if I
submit from another user I start getting security warnings about that user
not having permission for shortcircuit.  So perhaps it is working, but I
have no clue why that log fails to show anywhere.

Regarding enabling checksums that is an interesting question.  Do I have to
do a major compaction after enabling so HBase writes the checksum?  Or will
it detect the setting change and do that automatically?  What if I disable,
will it remove the checksums?
On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
> wrote:

> Hi Robert,
>
> That's perfectly fine, it was my next question ;)
>
>
> Anoop, I saw a 5% performance increase by activating HBase Checksum.
> Can I disable it again to retry the baseline and see the difference?
> Or now that it's there, it's to late?
>
> Also, regarding BlockReaderLocal, I don't find that in my logs, but
> after I have activated the shortcircuit, I saw a 41% performance
> increase, so I'm almost sure it's working, but I don't know either how
>  to check that.
>
> What's the best way to see that on the logs? It's not display when
> HBase is starting. Even not displayed when I'n doing major
> compactions.
>
> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
> still can't see anything. Not in the region server, and not in the
> datanode.
>
> Also, to check with HDFS level logs whether the checksum meta file is
> getting read to the DFS client, I'm not really sure how to acheive
> that.
>
> JM
>
> 2013/2/1, Robert Dyer <[EMAIL PROTECTED]>:
> > Ok grepping the RS logs I see nothing with 'local' in any of them.
>  Thanks
> > for that hint.
> >
> > For the test I was using, I know it is data local.  Every map task
> launched
> > data local, and no regions were moving recently.
> >
> > I think I've hijacked this thread enough, I'll move my issues to another.
> > ;-)
> >
> >
> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <[EMAIL PROTECTED]>
> > wrote:
> >
> >> Hi Robert
> >>           When HDFS is doing the local short circuit read, it will use
> >> BlockReaderLocal class for reading.  There should be some logs at the
> DFS
> >> client side (RS) which tells abt creating new BlockReaderLocal .  If you
> >> can see this then sure the local read is happening.
> >>
> >> Also check DN log.  If local read happening, then you will not see  read
> >> request related logs for the HFile at the DN side.
> >> You check your no# of HFiles and names for checking the logs
> >>
> >> Are you sure that when you tested, u have data locality? Region
> movements
> >> across RSs can break the full data locality.
> >>
> >> -Anoop-
> >> ________________________________________
> >> From: Robert Dyer [[EMAIL PROTECTED]]
> >> Sent: Friday, February 01, 2013 11:10 AM
> >> To: Hbase-User
> >> Subject: Re: HBase Checksum
> >>
> >> Not trying to hijack your thread here...
> >>
> >> But can you verify via logs that the shortcircuit is working?  Because I
> >> enabled shortcircuit but I sure didn't see any performance increase.
> >>
> >> I haven't tried enabling hbase checksum yet but I'd like to be able to
> >> verify that works too.
> >>
> >>
> >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <[EMAIL PROTECTED]>
Robert Dyer
[EMAIL PROTECTED]
+
Jean-Marc Spaggiari 2013-02-01, 21:09
+
lars hofhansl 2013-02-01, 22:51
+
Jean-Marc Spaggiari 2013-02-01, 20:09