Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro Read with sync() {java.io.IOException: Invalid sync}


Copy link to this message
-
Re: Avro Read with sync() {java.io.IOException: Invalid sync}
Yes, Avro. A similar bug may exist in Avro's input buffering code.

Doug
On Dec 23, 2013 8:50 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <[EMAIL PROTECTED]> wrote:

> Hi Doug,
> You want me to raise a bug against Avro or Hadoop-Core. My guess is avro
> Regards,
> Deepak
>
>
> On Tue, Dec 24, 2013 at 12:10 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
>> This sounds like a bug.
>>
>> I wonder if it is similar to a related bug in Hadoop?
>>
>> https://issues.apache.org/jira/browse/HADOOP-9307
>>
>> If so, please file an issue in Jira.
>>
>> Doug
>>
>> On Sat, Dec 21, 2013 at 4:35 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[EMAIL PROTECTED]>
>> wrote:
>> > Hello,
>> > I have a 340 MB avro data file that contains records sorted and
>> identified
>> > by unique id (duplicate records exists). At the beginning of every
>> unique
>> > record a synchronization point is created with DataFileWriter.sync(). (I
>> > cannot or do not want to save the sync points and i do not want to use
>> > SortedKeyValueFile as output format for M/R job)
>> >
>> > There are at-least 25k synchronization points in a 340 MB file.
>> >
>> > Ex:
>> > Marker1_RecordA1_RecordA2_RecordA3_Marker2_RecordB1_RecordB2
>> >
>> >
>> > As records are sorted, for efficient retrieval, binary search is
>> performed
>> > using the attached code.
>> >
>> > Most of the times the search is successful, at times the code throws the
>> > following exception
>> > ------
>> > org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid
>> sync! at
>> > org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210
>> > ------
>> >
>> >
>> >
>> > Questions
>> > 1) Is it ok to have 25k sycn points for 300 MB file ? Does it cost in
>> > performance while reading ?
>> > 2) I note down the position that was used to invoke
>> fileReader.sync(mid);.
>> > If i catch AvroRuntimeException, close and open the file and sync(mid)
>> i do
>> > not see exception. Why should Avro throw exception before and not later
>> ?
>> > 3) Is there a limit on number of times sync() is invoked ?
>> > 4) When sync(position) is invoked, are any 0 >= position <= file.size()
>> > valid ? If yes why do i see AvroRuntimeException (#2) ?
>> >
>> > Regards,
>> > Deepak
>> >
>>
>
>
>
> --
> Deepak
>
>