Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> bug in SequenceFile.sync()?


+
Christopher Ng 2013-06-24, 09:20
Copy link to this message
-
Re: bug in SequenceFile.sync()?
Hi Chris,

Thanks for the report.  I filed
https://issues.apache.org/jira/browse/HADOOP-9667 for this.

Colin
Software Engineer, Cloudera
On Mon, Jun 24, 2013 at 2:20 AM, Christopher Ng <[EMAIL PROTECTED]> wrote:
> cross-posting this from cdh-users group where it received little interest:
>
> is there a bug in SequenceFile.sync()?  This is from cdh4.3.0:
>
>     /** Seek to the next sync mark past a given position.*/
>     public synchronized void sync(long position) throws IOException {
>       if (position+SYNC_SIZE >= end) {
>         seek(end);
>         return;
>       }
>
>       if (position < headerEnd) {
>         // seek directly to first record
>         in.seek(headerEnd);                                         <===> should this not call seek (ie this.seek) instead?
>         // note the sync marker "seen" in the header
>         syncSeen = true;
>         return;
>       }
>
> the problem is that when you sync to the start of a compressed file, the
> noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
> triggered.  When you subsequently call next() you're potentially getting
> keys from the buffer which still contains keys from the previous position
> of the file.
+
Christopher Ng 2013-06-24, 17:20
+
Jean-Baptiste Onofré 2013-06-24, 09:25