Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # dev - bug in SequenceFile.sync()?


+
Christopher Ng 2013-06-24, 09:20
+
Colin McCabe 2013-06-24, 16:39
+
Christopher Ng 2013-06-24, 17:20
Copy link to this message
-
Re: bug in SequenceFile.sync()?
Jean-Baptiste Onofré 2013-06-24, 09:25
Hi Christopher,

indeed, I think that the noBufferedKeys and valuesDecompressed should be
reset.

Regards
JB

On 06/24/2013 11:20 AM, Christopher Ng wrote:
> cross-posting this from cdh-users group where it received little interest:
>
> is there a bug in SequenceFile.sync()?  This is from cdh4.3.0:
>
>      /** Seek to the next sync mark past a given position.*/
>      public synchronized void sync(long position) throws IOException {
>        if (position+SYNC_SIZE >= end) {
>          seek(end);
>          return;
>        }
>
>        if (position < headerEnd) {
>          // seek directly to first record
>          in.seek(headerEnd);                                         <===> should this not call seek (ie this.seek) instead?
>          // note the sync marker "seen" in the header
>          syncSeen = true;
>          return;
>        }
>
> the problem is that when you sync to the start of a compressed file, the
> noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
> triggered.  When you subsequently call next() you're potentially getting
> keys from the buffer which still contains keys from the previous position
> of the file.
>

--
Jean-Baptiste Onofré
[EMAIL PROTECTED]
http://blog.nanthrax.net
Talend - http://www.talend.com