Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Re: Is it possible to append to an already existing avro file


Copy link to this message
-
Re: Is it possible to append to an already existing avro file
Doug Cutting 2013-02-07, 16:51
The avro-mapred module includes a Seekable implementation that works
with HDFS called FsInput:

http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/FsInput.html

With this, your example can be made considerably smaller.

Doug

On Thu, Feb 7, 2013 at 8:28 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> I assume by non-trivial you meant the extra Seekable stuff I needed to
> wrap around the DFS output streams to let Avro take it as append-able?
> I don't think its possible for Avro to carry it since Avro (core) does
> not reverse-depend on Hadoop. Should we document it somewhere though?
> Do you have any ideas on the best place to do that?
>
> On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <[EMAIL PROTECTED]> wrote:
>> Thanks so much for the code -- it works great!
>>
>> Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append.
>>
>> --- On Wed, 2/6/13, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> From: Harsh J <[EMAIL PROTECTED]>
>>> Subject: Re: Is it possible to append to an already existing avro file
>>> To: [EMAIL PROTECTED]
>>> Date: Wednesday, February 6, 2013, 11:17 AM
>>> Hey Michael,
>>>
>>> It does implement the regular Java OutputStream interface,
>>> as seen in
>>> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
>>>
>>> Here's a sample program that works on Hadoop 2.x in my
>>> tests:
>>> https://gist.github.com/QwertyManiac/4724582
>>>
>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <[EMAIL PROTECTED]>
>>> wrote:
>>> > I don't believe a Hadoop FileSystem is a Java
>>> OutputStream?
>>> >
>>> > --- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]>
>>> wrote:
>>> >
>>> >> From: Doug Cutting <[EMAIL PROTECTED]>
>>> >> Subject: Re: Is it possible to append to an already
>>> existing avro file
>>> >> To: [EMAIL PROTECTED]
>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>> >> It will work on an OutputStream that
>>> >> supports append.
>>> >>
>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>> >> java.io.OutputStream)
>>> >>
>>> >> So it depends on how well HDFS implements
>>> >> FileSystem#append(), not on
>>> >> any changes in Avro.
>>> >>
>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>> >>
>>> >> I have no recent personal experience with append
>>> in
>>> >> HDFS.  Does anyone
>>> >> else here?
>>> >>
>>> >> Doug
>>> >>
>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>> <[EMAIL PROTECTED]>
>>> >> wrote:
>>> >> > My understanding is that will append to a file
>>> on the
>>> >> local filesystem, but not to a file on HDFS.
>>> >> >
>>> >> > --- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]>
>>> >> wrote:
>>> >> >
>>> >> >> From: Doug Cutting <[EMAIL PROTECTED]>
>>> >> >> Subject: Re: Is it possible to append to
>>> an already
>>> >> existing avro file
>>> >> >> To: [EMAIL PROTECTED]
>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>> >> >> The Jira is:
>>> >> >>
>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>> >> >>
>>> >> >> It is possible to append to an existing
>>> Avro file:
>>> >> >>
>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >>
>>> >> >> Should we close that issue as "fixed"?
>>> >> >>
>>> >> >> Doug
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <[EMAIL PROTECTED]>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,