Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Is it possible to append to an already existing avro file


Copy link to this message
-
Re: Is it possible to append to an already existing avro file
Scott Carey 2012-02-21, 17:02

On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <[EMAIL PROTECTED]>
wrote:

>Yep, I saw that method as well as the stackoverflow post. However, I'm
>interested how to append to a file on the arbitrary file system, not only
>on the local one.
>
>I want to get an OutputStream based on the Path and the FileSystem
>implementation and then pass it for appending to avro methods.
>
>Is that possible?

It is not possible without modifying DataFileWriter. Please open a JIRA
ticket.  

It could not simply append to an OutputStream, since it must either:
* Seek to the start to validate the schemas match and find the sync
marker, or
* Trust that the schemas match and find the sync marker from the last block

DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
could add something to the mapred module that takes a Path and FileSystem
and returns
something that implemements an interface that DataFileWriter can append
to.  This would be something that is both a
http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInp
ut.html
and an OutputStream, or has both an InputStream from the start of the
existing file and an OutputStream at the end.
>
>Thanks,
>Vyacheslav
>
>On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>
>> Hi,
>>
>> Use the appendTo feature of the DataFileWriter. See
>>
>>http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileW
>>riter.html#appendTo(java.io.File)
>>
>> For a quick setup example, read also:
>>
>>http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-exis
>>ting-avro-data-file
>>
>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>> <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>
>>> is it possible to append to an already existing avro file when it was
>>> written and closed before?
>>>
>>> If I use
>>> outputStream = fs.append(avroFilePath);
>>>
>>> then later on I get: java.io.IOException: Invalid sync!
>>>
>>> Probably because the schema is written twice and some other issues.
>>>
>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>gets
>>> overwritten.
>>>
>>> Thanks,
>>> Vyacheslav
>>
>>
>>
>> --
>> Harsh J
>> Customer Ops. Engineer
>> Cloudera | http://tiny.cloudera.com/about
>