Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> bug in SequenceFile.sync()?


+
Christopher Ng 2013-06-24, 09:20
Copy link to this message
-
Re: bug in SequenceFile.sync()?
Hi Chris,

Thanks for the report.  I filed
https://issues.apache.org/jira/browse/HADOOP-9667 for this.

Colin
Software Engineer, Cloudera
On Mon, Jun 24, 2013 at 2:20 AM, Christopher Ng <[EMAIL PROTECTED]> wrote:
> cross-posting this from cdh-users group where it received little interest:
>
> is there a bug in SequenceFile.sync()?  This is from cdh4.3.0:
>
>     /** Seek to the next sync mark past a given position.*/
>     public synchronized void sync(long position) throws IOException {
>       if (position+SYNC_SIZE >= end) {
>         seek(end);
>         return;
>       }
>
>       if (position < headerEnd) {
>         // seek directly to first record
>         in.seek(headerEnd);                                         <===> should this not call seek (ie this.seek) instead?
>         // note the sync marker "seen" in the header
>         syncSeen = true;
>         return;
>       }
>
> the problem is that when you sync to the start of a compressed file, the
> noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
> triggered.  When you subsequently call next() you're potentially getting
> keys from the buffer which still contains keys from the previous position
> of the file.
+
Christopher Ng 2013-06-24, 17:20
+
Jean-Baptiste Onofré 2013-06-24, 09:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB