Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Hlog Group Commit Question: SequenceFileLogReader


Copy link to this message
-
Re: Hlog Group Commit Question: SequenceFileLogReader
Nicolas Spiegelberg 2010-01-26, 23:50
Sounds like I also have 0.20.3RC merging as well.  Version mistake on my
behalf.  We have a 0.20.3-dev that must've missed the 'syncfs' changes.  I
was merging based solely on the 0.21 trunk.  I'll look into the 0.20.3 code
some more.

I have done the HDFS-200 merge & the trunk group commit, now I need to
reconcile that with the 0.20.3RC code since we don't currently plan on
merging HDFS-265.

Thanks!
Nicolas Spiegelberg

On 1/26/10 2:02 PM, "Stack" <[EMAIL PROTECTED]> wrote:

> HBase 0.20 had a hack that would recognize the presence of Dhruba's
> HDFS-200.  If it had been applied, then we'd do the open-for-append,
> close, and reopen to recover edits written to an unclosed WAL/HLog
> file (Grep 'syncfs' in HLog on the 0.20 branch).
>
> In HBase TRUNK, the above hackery was stripped out.  In TRUNK we are
> leaning on the new hflush/HDFS-265 rather than HDFS-200.  For hflush,
> when we do FSDataInputStream::available(), its returning the 'right'
> answer (WALReaderFsDataInputStream::getPos() was added before an API
> was available.  HBASE-2069 is about using the new API instead of this
> getPos fancy-dancing).
>
> It sounds like you need to do a bit of merging of TRUNK group commit
> and the old hbase code that exploited HDFS-200?
>
> St.Ack
>
> On Tue, Jan 26, 2010 at 12:35 PM, Nicolas Spiegelberg
> <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I am trying to backport the HLog group commit functionality to Hbase 0.20.
>>  For proper reliability, I am working with Dhruba to get the 0.21 syncFs()
>> changes from HDFS ported back to HDFS 0.20 as well.  When going through a
>> peer review of the modified code, my group had a question about the
>> SequenceFileLogReader.java (WALReader).  I am hoping that you guys could be
>> of assistance.
>>
>> I know that there is an open issue [HBASE-2069] where Hlog::splitLog() does
>> not call DFSDataInputStream::getVisibleLength(), which would properly sync
>> hflushed, but unclosed, file lengths.  I believe the current workaround is to
>> open an HDFS file in append mode & then close, which would cause the namenode
>> to get updates from the datanodes.  However, I don¹t see that shim present in
>> Hlog::splitLog() on the 0.21 trunk.  Is this a pending issue to fix or is
>> calling FSDataInputStream::available() within
>> WALReaderFsDataInputStream::getPos() sufficient to force the namenode to sync
>> up with the datanodes?
>>
>> Nicolas Spiegelberg
>>