Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Re: Is it possible to append to an already existing avro file


Copy link to this message
-
Re: Is it possible to append to an already existing avro file
I assume by non-trivial you meant the extra Seekable stuff I needed to
wrap around the DFS output streams to let Avro take it as append-able?
I don't think its possible for Avro to carry it since Avro (core) does
not reverse-depend on Hadoop. Should we document it somewhere though?
Do you have any ideas on the best place to do that?

On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <[EMAIL PROTECTED]> wrote:
> Thanks so much for the code -- it works great!
>
> Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append.
>
> --- On Wed, 2/6/13, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> From: Harsh J <[EMAIL PROTECTED]>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: [EMAIL PROTECTED]
>> Date: Wednesday, February 6, 2013, 11:17 AM
>> Hey Michael,
>>
>> It does implement the regular Java OutputStream interface,
>> as seen in
>> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
>>
>> Here's a sample program that works on Hadoop 2.x in my
>> tests:
>> https://gist.github.com/QwertyManiac/4724582
>>
>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <[EMAIL PROTECTED]>
>> wrote:
>> > I don't believe a Hadoop FileSystem is a Java
>> OutputStream?
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> From: Doug Cutting <[EMAIL PROTECTED]>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: [EMAIL PROTECTED]
>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>> >> It will work on an OutputStream that
>> >> supports append.
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> >> java.io.OutputStream)
>> >>
>> >> So it depends on how well HDFS implements
>> >> FileSystem#append(), not on
>> >> any changes in Avro.
>> >>
>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>> >>
>> >> I have no recent personal experience with append
>> in
>> >> HDFS.  Does anyone
>> >> else here?
>> >>
>> >> Doug
>> >>
>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>> <[EMAIL PROTECTED]>
>> >> wrote:
>> >> > My understanding is that will append to a file
>> on the
>> >> local filesystem, but not to a file on HDFS.
>> >> >
>> >> > --- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]>
>> >> wrote:
>> >> >
>> >> >> From: Doug Cutting <[EMAIL PROTECTED]>
>> >> >> Subject: Re: Is it possible to append to
>> an already
>> >> existing avro file
>> >> >> To: [EMAIL PROTECTED]
>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> >> The Jira is:
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >> >>
>> >> >> It is possible to append to an existing
>> Avro file:
>> >> >>
>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>
>> >> >> Should we close that issue as "fixed"?
>> >> >>
>> >> >> Doug
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <[EMAIL PROTECTED]>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <[EMAIL PROTECTED]>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav

Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB