Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Puzzling behaviour with HBase checksums


Copy link to this message
-
Re: Puzzling behaviour with HBase checksums
Okay - I guess I now know what's going on here. Essentially there is a 7
byte header for each block which is read initially irrespective of whether
this is a checksum/no checksum read. Some version checking is done here.
>From what I can see, the FB branch (which I guess is more optimized for
performance) only reads the header if checksum verification is on. I wonder
if that should be done here too.

However, it probably is also the case that most of this stuff is already
page cached since its just the first 7 bytes in a file which otherwise has
100s of kilobytes of checksums.
On Fri, Jul 5, 2013 at 5:21 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> I just set this value in hbase-site.xml but still the 7 byte reads and
> lseek(s) persist.
>
>
> On Fri, Jul 5, 2013 at 4:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> What value did you set for dfs.client.read.shortcircuit.skip.checksum ?
>>
>> Cheers
>>
>> On Fri, Jul 5, 2013 at 2:55 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>>
>> > Hi,
>> >
>> > We are running hbase with hbase.regionserver.checksum.verify set to
>> true.
>> > But we are seeing an equal # of seeks for .meta files on HDFS and data
>> > blocks. This is rather puzzling and I dont know if its broken. The hbase
>> > jar is compiled against 2.0.3-alpha and this behaviour occurs for both
>> > 0.94.3 and 0.94.7. Shortcircuit local reads is enabled is working well
>> > since only the region server is accessing the disk.
>> >
>> > We run an strace limited to lseek calls and get the following:
>> >
>> > 28162 lseek(*668*, 0, SEEK_SET)           = 0
>> > 28162 lseek(*635*, 57479463, SEEK_SET)    = 57479463
>> > 28162 lseek(*2255*, 0, SEEK_SET)          = 0
>> > 28162 lseek(*1938*, 29285843, SEEK_SET)   = 29285843
>> >
>> > Then we use lsof to find the underlying files and match them against the
>> > corresponding file decriptors...
>> >
>> > java    27947 hbase * 668u *  REG             202,32   1048583 36176608
>> >
>> >
>> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
>> > *blk_5081211948968918615_597521.meta*
>> > *
>> > *
>> > java    27947 hbase  *635u*      REG             202,32 134217728
>> 36176607
>> >
>> >
>> /data/xvdc/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir54/
>> > *blk_5081211948968918615*
>> > *
>> > *
>> > java    27947 hbase *2255u*   REG             202,16    802375 32768850
>> >
>> >
>> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
>> > *blk_2670783290218647110_614641.meta*
>> > *
>> > *
>> > java    27947 hbase *1938u*   REG             202,16 102702747 32768849
>> >
>> >
>> /mnt/hadoop/dfs/data/current/BP-1854623640-10.158.62.78-1363075060974/current/finalized/subdir40/
>> > *blk_2670783290218647110*
>> >
>> > The pattern in strace is pretty clear - first the .meta is read and then
>> > the block is accessed. I am wondering if there are other places apart
>> from
>> > the checksum where the .meta file for the HDFS block is being accessed
>> or
>> > if the checksum stuff is simply broken ? It seems we are accessing 7
>> byte
>> > values in these .meta files from more strace output. Is there a way I
>> can
>> > find out if the checksums were actually written out to HFiles in the
>> first
>> > place ?
>> >
>> > Thanks
>> > Varun
>> >
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB