Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro file - Seek to specific offset and read


Copy link to this message
-
Re: Avro file - Seek to specific offset and read
Hi Doug,

Adjusting the start and end offsets (returned by DataFileWriter.sync()) back by 16 bytes (DataFileConstants.SYNC_SIZE) fixed the issue.
This assumption is based on looking at the DataFileReader.pastSync() implementation.

97   public boolean pastSync(long position) throws IOException {
98     return ((blockStart >= position+SYNC_SIZE)||(blockStart >= sin.length()));
99   }

Let me know if this assumption is correct.

Thanks
Venkat
________________________________
 From: Doug Cutting <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; Venkat <[EMAIL PROTECTED]>
Sent: Friday, February 22, 2013 12:01 PM
Subject: Re: Avro file - Seek to specific offset and read
 
Venkat,

That should work.  It's hard for me to guess what's going wrong,
whether there's a bug in Avro, in your program, or perhaps just
unclear documentation.  Could you post a complete program that
demonstrates the issue?

Thanks,

Doug

On Wed, Feb 20, 2013 at 12:23 PM, Venkat <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> Using DataFileReader, I'm trying to read data from a specific [start-offset]
> to an [end-offset]. Both the start and end offsets are marked with
> synchronization markers using DataFileWriter.sync()
>
> The following is the snipped I use to read the data back:
>
>         DataFileReader<GenericRecord> fileReader = new
> DataFileReader<GenericRecord>(input, reader);
>         fileReader.seek(startOffset);  // set to the start-offset
>         while(fileReader.hasNext() && !fileReader.pastSync(endOffset))
>         {
>             GenericRecord gr = fileReader.next();
>         }
>
> This, however, reads & returns more records than what I wrote between the
> two offsets.
>
> Appreciate your help regarding this.
>
> Thanks
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB