|
|
-
Avro file - Seek to specific offset and read
Venkat 2013-02-20, 20:23
Hi All, Using DataFileReader, I'm trying to read data from a specific [start-offset] to an [end-offset]. Both the start and end offsets are marked with synchronization markers using DataFileWriter.sync()
The following is the snipped I use to read the data back:
DataFileReader<GenericRecord> fileReader = new DataFileReader<GenericRecord>(input, reader); fileReader.seek(startOffset); // set to the start-offset
while(fileReader.hasNext() && !fileReader.pastSync(endOffset)) { GenericRecord gr = fileReader.next(); }
This, however, reads & returns more records than what I wrote between the two offsets. Appreciate your help regarding this.
Thanks
+
Venkat 2013-02-20, 20:23
-
Re: Avro file - Seek to specific offset and read
Doug Cutting 2013-02-22, 20:01
Venkat,
That should work. It's hard for me to guess what's going wrong, whether there's a bug in Avro, in your program, or perhaps just unclear documentation. Could you post a complete program that demonstrates the issue?
Thanks,
Doug
On Wed, Feb 20, 2013 at 12:23 PM, Venkat <[EMAIL PROTECTED]> wrote: > Hi All, > > Using DataFileReader, I'm trying to read data from a specific [start-offset] > to an [end-offset]. Both the start and end offsets are marked with > synchronization markers using DataFileWriter.sync() > > The following is the snipped I use to read the data back: > > DataFileReader<GenericRecord> fileReader = new > DataFileReader<GenericRecord>(input, reader); > fileReader.seek(startOffset); // set to the start-offset > while(fileReader.hasNext() && !fileReader.pastSync(endOffset)) > { > GenericRecord gr = fileReader.next(); > } > > This, however, reads & returns more records than what I wrote between the > two offsets. > > Appreciate your help regarding this. > > Thanks >
+
Doug Cutting 2013-02-22, 20:01
-
Re: Avro file - Seek to specific offset and read
Venkat 2013-02-23, 02:09
Hi Doug,
Adjusting the start and end offsets (returned by DataFileWriter.sync()) back by 16 bytes (DataFileConstants.SYNC_SIZE) fixed the issue. This assumption is based on looking at the DataFileReader.pastSync() implementation.
97 public boolean pastSync(long position) throws IOException { 98 return ((blockStart >= position+SYNC_SIZE)||(blockStart >= sin.length())); 99 }
Let me know if this assumption is correct.
Thanks Venkat ________________________________ From: Doug Cutting <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Venkat <[EMAIL PROTECTED]> Sent: Friday, February 22, 2013 12:01 PM Subject: Re: Avro file - Seek to specific offset and read Venkat,
That should work. It's hard for me to guess what's going wrong, whether there's a bug in Avro, in your program, or perhaps just unclear documentation. Could you post a complete program that demonstrates the issue?
Thanks,
Doug
On Wed, Feb 20, 2013 at 12:23 PM, Venkat <[EMAIL PROTECTED]> wrote: > Hi All, > > Using DataFileReader, I'm trying to read data from a specific [start-offset] > to an [end-offset]. Both the start and end offsets are marked with > synchronization markers using DataFileWriter.sync() > > The following is the snipped I use to read the data back: > > DataFileReader<GenericRecord> fileReader = new > DataFileReader<GenericRecord>(input, reader); > fileReader.seek(startOffset); // set to the start-offset > while(fileReader.hasNext() && !fileReader.pastSync(endOffset)) > { > GenericRecord gr = fileReader.next(); > } > > This, however, reads & returns more records than what I wrote between the > two offsets. > > Appreciate your help regarding this. > > Thanks >
+
Venkat 2013-02-23, 02:09
|
|