Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Appending to .avro log files


+
Terry Healy 2013-01-03, 20:36
+
Scott Carey 2013-01-08, 09:14
Copy link to this message
-
Re: Appending to .avro log files
Thank you Scott - that did the trick. It seems that I may need to reduce
my sync value as well.
On 01/08/2013 04:14 AM, Scott Carey wrote:
> A sync marker delimits each block in the avro file.  If you want to start
> reading data from the middle of a 100GB file, DataFileReader will seek to
> the middle and find the next sync marker.  Each block can be individually
> compressed, and by default when writing a file the writer will not
> compress the block and flush to disk until a block as gotten as large as
> the sync interval in bytes.    Alternatively, you can manually sync().
>
> If you have a 1000000 byte sync interval, you may not see any data reach
> disk until that many bytes have been written (or sync() is called
> manually).
>
> Your problem is likely that the first block in the file has not been
> flushed to disk yet, and therefore the file is corrupt and missing a
> trailing sync marker.
>
> On 1/3/13 12:36 PM, "Terry Healy" <[EMAIL PROTECTED]> wrote:
>
>> Hello-
>>
>> I'm upgrading a logging program to append GenericRecords to a .avro file
>> instead of text (.tsv). I have a working schema that is used to convert
>> existing .tsv of the same format into .avro and that works fine.
>>
>> When I run a test writing 30,000 bogus records, it runs but when I try
>> to use "avro-tools-1.7.3.jar tojson" on the output file, it reports:
>>
>> "AvroRuntimeException: java.io.IOException: Invalid sync!"
>>
>> The file is still open at this point since the logging program is
>> running. Is this expected behavior because it is still open? (getmeta
>> and getschema work fine).
>>
>> I'm not sure if it has any bearing, since I never really understood the
>> function of the the AVRO sync interval; in this and the working programs
>> it is set to 1000000.
>>
>> Any ideas appreciated.
>>
>> -Terry
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB