Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Is it possible to append to an already existing avro file


+
Vyacheslav Zholudev 2012-02-20, 21:45
+
Harsh J 2012-02-21, 04:29
+
Vyacheslav Zholudev 2012-02-21, 15:29
+
Scott Carey 2012-02-21, 17:02
Copy link to this message
-
Re: Is it possible to append to an already existing avro file
Vyacheslav Zholudev 2012-02-22, 09:57
Thanks for your reply, I suspected this.

I will create a JIRA ticket.

Vyacheslav

On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:

>
> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <[EMAIL PROTECTED]>
> wrote:
>
>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>> interested how to append to a file on the arbitrary file system, not only
>> on the local one.
>>
>> I want to get an OutputStream based on the Path and the FileSystem
>> implementation and then pass it for appending to avro methods.
>>
>> Is that possible?
>
> It is not possible without modifying DataFileWriter. Please open a JIRA
> ticket.  
>
> It could not simply append to an OutputStream, since it must either:
> * Seek to the start to validate the schemas match and find the sync
> marker, or
> * Trust that the schemas match and find the sync marker from the last block
>
> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
> could add something to the mapred module that takes a Path and FileSystem
> and returns
> something that implemements an interface that DataFileWriter can append
> to.  This would be something that is both a
> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInp
> ut.html
> and an OutputStream, or has both an InputStream from the start of the
> existing file and an OutputStream at the end.
>
>
>
>
>>
>> Thanks,
>> Vyacheslav
>>
>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>>
>>> Hi,
>>>
>>> Use the appendTo feature of the DataFileWriter. See
>>>
>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileW
>>> riter.html#appendTo(java.io.File)
>>>
>>> For a quick setup example, read also:
>>>
>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-exis
>>> ting-avro-data-file
>>>
>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>> <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>
>>>> is it possible to append to an already existing avro file when it was
>>>> written and closed before?
>>>>
>>>> If I use
>>>> outputStream = fs.append(avroFilePath);
>>>>
>>>> then later on I get: java.io.IOException: Invalid sync!
>>>>
>>>> Probably because the schema is written twice and some other issues.
>>>>
>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>> gets
>>>> overwritten.
>>>>
>>>> Thanks,
>>>> Vyacheslav
>>>
>>>
>>>
>>> --
>>> Harsh J
>>> Customer Ops. Engineer
>>> Cloudera | http://tiny.cloudera.com/about
>>
>
>
+
Michael Malak 2013-02-01, 19:32