Thank you Scott - that did the trick. It seems that I may need to reduce
my sync value as well.
On 01/08/2013 04:14 AM, Scott Carey wrote:
> A sync marker delimits each block in the avro file. If you want to start
> reading data from the middle of a 100GB file, DataFileReader will seek to
> the middle and find the next sync marker. Each block can be individually
> compressed, and by default when writing a file the writer will not
> compress the block and flush to disk until a block as gotten as large as
> the sync interval in bytes. Alternatively, you can manually sync().
> If you have a 1000000 byte sync interval, you may not see any data reach
> disk until that many bytes have been written (or sync() is called
> Your problem is likely that the first block in the file has not been
> flushed to disk yet, and therefore the file is corrupt and missing a
> trailing sync marker.
> On 1/3/13 12:36 PM, "Terry Healy" <[EMAIL PROTECTED]> wrote:
>> I'm upgrading a logging program to append GenericRecords to a .avro file
>> instead of text (.tsv). I have a working schema that is used to convert
>> existing .tsv of the same format into .avro and that works fine.
>> When I run a test writing 30,000 bogus records, it runs but when I try
>> to use "avro-tools-1.7.3.jar tojson" on the output file, it reports:
>> "AvroRuntimeException: java.io.IOException: Invalid sync!"
>> The file is still open at this point since the logging program is
>> running. Is this expected behavior because it is still open? (getmeta
>> and getschema work fine).
>> I'm not sure if it has any bearing, since I never really understood the
>> function of the the AVRO sync interval; in this and the working programs
>> it is set to 1000000.
>> Any ideas appreciated.