Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro Read with sync() {java.io.IOException: Invalid sync}


Copy link to this message
-
Re: Avro Read with sync() {java.io.IOException: Invalid sync}
This sounds like a bug.

I wonder if it is similar to a related bug in Hadoop?

https://issues.apache.org/jira/browse/HADOOP-9307

If so, please file an issue in Jira.

Doug

On Sat, Dec 21, 2013 at 4:35 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[EMAIL PROTECTED]> wrote:
> Hello,
> I have a 340 MB avro data file that contains records sorted and identified
> by unique id (duplicate records exists). At the beginning of every unique
> record a synchronization point is created with DataFileWriter.sync(). (I
> cannot or do not want to save the sync points and i do not want to use
> SortedKeyValueFile as output format for M/R job)
>
> There are at-least 25k synchronization points in a 340 MB file.
>
> Ex:
> Marker1_RecordA1_RecordA2_RecordA3_Marker2_RecordB1_RecordB2
>
>
> As records are sorted, for efficient retrieval, binary search is performed
> using the attached code.
>
> Most of the times the search is successful, at times the code throws the
> following exception
> ------
> org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at
> org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210
> ------
>
>
>
> Questions
> 1) Is it ok to have 25k sycn points for 300 MB file ? Does it cost in
> performance while reading ?
> 2) I note down the position that was used to invoke fileReader.sync(mid);.
> If i catch AvroRuntimeException, close and open the file and sync(mid) i do
> not see exception. Why should Avro throw exception before and not later ?
> 3) Is there a limit on number of times sync() is invoked ?
> 4) When sync(position) is invoked, are any 0 >= position <= file.size()
> valid ? If yes why do i see AvroRuntimeException (#2) ?
>
> Regards,
> Deepak
>