Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Re: Is it possible to append to an already existing avro file


+
Michael Malak 2013-02-01, 19:32
+
Doug Cutting 2013-02-06, 00:08
+
Michael Malak 2013-02-06, 00:10
+
Doug Cutting 2013-02-06, 00:27
+
Michael Malak 2013-02-06, 03:30
+
Harsh J 2013-02-06, 18:17
+
Michael Malak 2013-02-07, 00:42
+
Harsh J 2013-02-07, 16:28
+
Doug Cutting 2013-02-07, 16:51
+
Harsh J 2013-02-07, 16:56
+
Michael Malak 2013-02-07, 16:42
Copy link to this message
-
Re: Is it possible to append to an already existing avro file
Ken Krugler 2013-02-06, 18:03

On Feb 5, 2013, at 7:30pm, Michael Malak wrote:

> I don't believe a Hadoop FileSystem is a Java OutputStream?

The Hadoop FileSystem.append() method returns an FSDataOutputStream, which is a sub-class of the Java OutputStream.

-- Ken

>
> --- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
>> From: Doug Cutting <[EMAIL PROTECTED]>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: [EMAIL PROTECTED]
>> Date: Tuesday, February 5, 2013, 5:27 PM
>> It will work on an OutputStream that
>> supports append.
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> java.io.OutputStream)
>>
>> So it depends on how well HDFS implements
>> FileSystem#append(), not on
>> any changes in Avro.
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>
>> I have no recent personal experience with append in
>> HDFS.  Does anyone
>> else here?
>>
>> Doug
>>
>> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <[EMAIL PROTECTED]>
>> wrote:
>>> My understanding is that will append to a file on the
>> local filesystem, but not to a file on HDFS.
>>>
>>> --- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> From: Doug Cutting <[EMAIL PROTECTED]>
>>>> Subject: Re: Is it possible to append to an already
>> existing avro file
>>>> To: [EMAIL PROTECTED]
>>>> Date: Tuesday, February 5, 2013, 5:08 PM
>>>> The Jira is:
>>>>
>>>> https://issues.apache.org/jira/browse/AVRO-1035
>>>>
>>>> It is possible to append to an existing Avro file:
>>>>
>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>
>>>> Should we close that issue as "fixed"?
>>>>
>>>> Doug
>>>>
>>>> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <[EMAIL PROTECTED]>
>>>> wrote:
>>>>> Was a JIRA ticket ever created regarding
>> appending to
>>>> an existing Avro file on HDFS?
>>>>>
>>>>> What is the status of such a capability, a
>> year out
>>>> from when the issue below was raised?
>>>>>
>>>>> On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>>>> Zholudev" <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>>> Thanks for your reply, I suspected this.
>>>>>>
>>>>>> I will create a JIRA ticket.
>>>>>>
>>>>>> Vyacheslav
>>>>>>
>>>>>> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>>>> <[EMAIL PROTECTED]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yep, I saw that method as well as
>> the
>>>> stackoverflow post. However, I'm
>>>>>>>> interested how to append to a file
>> on the
>>>> arbitrary file system, not
>>>>>>>> only on the local one.
>>>>>>>>
>>>>>>>> I want to get an OutputStream
>> based on the
>>>> Path and the FileSystem
>>>>>>>> implementation and then pass it
>> for
>>>> appending to avro methods.
>>>>>>>>
>>>>>>>> Is that possible?
>>>>>>>
>>>>>>> It is not possible without modifying
>>>> DataFileWriter. Please open a JIRA
>>>>>>> ticket.
>>>>>>>
>>>>>>> It could not simply append to an
>> OutputStream,
>>>> since it must either:
>>>>>>> * Seek to the start to validate the
>> schemas
>>>> match and find the sync
>>>>>>> marker, or
>>>>>>> * Trust that the schemas match and
>> find the
>>>> sync marker from the last
>>>>>>> block
>>>>>>>
>>>>>>> DataFileWriter cannot refer to Hadoop
>> classes
>>>> such as FileSystem, but we
>>>>>>> could add something to the mapred
>> module that
>>>> takes a Path and
>>>>>>> FileSystem and returns something that
>>>> implemements an interface that
>>>>>>> DataFileWriter can append to.
>> This would
>>>> be something that is both a
>>>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>>>>> and an OutputStream, or has both an
>> InputStream
>>>> from the start of the
>>>>>>> existing file and an OutputStream at
>> the end.

Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
+
TrevniUser 2013-07-08, 16:29
+
Doug Cutting 2013-07-09, 16:29
+
TrevniUser 2013-07-09, 17:24