Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Re: Is it possible to append to an already existing avro file


Copy link to this message
-
Re: Is it possible to append to an already existing avro file
Michael Malak 2013-02-01, 19:32
Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS?

What is the status of such a capability, a year out from when the issue below was raised?

On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" <[EMAIL PROTECTED]> wrote:

> Thanks for your reply, I suspected this.
>
> I will create a JIRA ticket.
>
> Vyacheslav
>
> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>
>>
>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>>> interested how to append to a file on the arbitrary file system, not
>>> only on the local one.
>>>
>>> I want to get an OutputStream based on the Path and the FileSystem
>>> implementation and then pass it for appending to avro methods.
>>>
>>> Is that possible?
>>
>> It is not possible without modifying DataFileWriter. Please open a JIRA
>> ticket.  
>>
>> It could not simply append to an OutputStream, since it must either:
>> * Seek to the start to validate the schemas match and find the sync
>> marker, or
>> * Trust that the schemas match and find the sync marker from the last
>> block
>>
>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
>> could add something to the mapred module that takes a Path and
>> FileSystem and returns something that implemements an interface that
>> DataFileWriter can append to.  This would be something that is both a
>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> and an OutputStream, or has both an InputStream from the start of the
>> existing file and an OutputStream at the end.
>>
>>> Thanks,
>>> Vyacheslav
>>>
>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>>>
>>>> Hi,
>>>>
>>>> Use the appendTo feature of the DataFileWriter. See
>>>>
>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>
>>>> For a quick setup example, read also:
>>>>
>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>>
>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>>> <[EMAIL PROTECTED]> wrote:
>>>>> Hi,
>>>>>
>>>>> is it possible to append to an already existing avro file when it was
>>>>> written and closed before?
>>>>>
>>>>> If I use
>>>>> outputStream = fs.append(avroFilePath);
>>>>>
>>>>> then later on I get: java.io.IOException: Invalid sync!
>>>>>
>>>>> Probably because the schema is written twice and some other issues.
>>>>>
>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>>> gets
>>>>> overwritten.
>>>>>
>>>>> Thanks,
>>>>> Vyacheslav
>>>>
>>>> --
>>>> Harsh J
>>>> Customer Ops. Engineer
>>>> Cloudera | http://tiny.cloudera.com/about