Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Re: Is it possible to append to an already existing avro file


Copy link to this message
-
Re: Is it possible to append to an already existing avro file
Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS?

What is the status of such a capability, a year out from when the issue below was raised?

On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" <[EMAIL PROTECTED]> wrote:

> Thanks for your reply, I suspected this.
>
> I will create a JIRA ticket.
>
> Vyacheslav
>
> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>
>>
>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>>> interested how to append to a file on the arbitrary file system, not
>>> only on the local one.
>>>
>>> I want to get an OutputStream based on the Path and the FileSystem
>>> implementation and then pass it for appending to avro methods.
>>>
>>> Is that possible?
>>
>> It is not possible without modifying DataFileWriter. Please open a JIRA
>> ticket.  
>>
>> It could not simply append to an OutputStream, since it must either:
>> * Seek to the start to validate the schemas match and find the sync
>> marker, or
>> * Trust that the schemas match and find the sync marker from the last
>> block
>>
>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
>> could add something to the mapred module that takes a Path and
>> FileSystem and returns something that implemements an interface that
>> DataFileWriter can append to.  This would be something that is both a
>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> and an OutputStream, or has both an InputStream from the start of the
>> existing file and an OutputStream at the end.
>>
>>> Thanks,
>>> Vyacheslav
>>>
>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>>>
>>>> Hi,
>>>>
>>>> Use the appendTo feature of the DataFileWriter. See
>>>>
>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>
>>>> For a quick setup example, read also:
>>>>
>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>>
>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>>> <[EMAIL PROTECTED]> wrote:
>>>>> Hi,
>>>>>
>>>>> is it possible to append to an already existing avro file when it was
>>>>> written and closed before?
>>>>>
>>>>> If I use
>>>>> outputStream = fs.append(avroFilePath);
>>>>>
>>>>> then later on I get: java.io.IOException: Invalid sync!
>>>>>
>>>>> Probably because the schema is written twice and some other issues.
>>>>>
>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>>> gets
>>>>> overwritten.
>>>>>
>>>>> Thanks,
>>>>> Vyacheslav
>>>>
>>>> --
>>>> Harsh J
>>>> Customer Ops. Engineer
>>>> Cloudera | http://tiny.cloudera.com/about
+
Doug Cutting 2013-02-06, 00:08
+
Michael Malak 2013-02-06, 00:10
+
Doug Cutting 2013-02-06, 00:27
+
Michael Malak 2013-02-06, 03:30
+
Harsh J 2013-02-06, 18:17
+
Michael Malak 2013-02-07, 00:42
+
Harsh J 2013-02-07, 16:28
+
Doug Cutting 2013-02-07, 16:51
+
Harsh J 2013-02-07, 16:56
+
Michael Malak 2013-02-07, 16:42
+
Ken Krugler 2013-02-06, 18:03
+
TrevniUser 2013-07-08, 16:29
+
Doug Cutting 2013-07-09, 16:29
+
TrevniUser 2013-07-09, 17:24
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB