Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Re: Is it possible to append to an already existing avro file


Copy link to this message
-
Re: Is it possible to append to an already existing avro file
I confess to being a user of rather than a developer of open source, but perhaps you could elaborate on what "depends on" means and what the consequences are?

Isn't it -- or couldn't it be made -- a run-time binding, so that only those who try to use the HDFS append functionality would be required to also include the HDFS Jars in their classpath?

Or is the issue more of a bookkeeping one, whereby every update to HDFS will require an Avro regression test?

Now that Hive supports Avro as of the Jan. 11 release of Hive 0.10, the use case of ingesting data into Avro on HDFS is only going to get more popular, and appending is very handy for ingesting, especially for live real-time or near-real-time data.  So it seems to me that if the inconveniences are minor or can be worked around, that Avro indeed should perhaps "depend on" HDFS.

--- On Thu, 2/7/13, Harsh J <[EMAIL PROTECTED]> wrote:

> From: Harsh J <[EMAIL PROTECTED]>
> Subject: Re: Is it possible to append to an already existing avro file
> To: [EMAIL PROTECTED]
> Date: Thursday, February 7, 2013, 9:28 AM
> I assume by non-trivial you meant the
> extra Seekable stuff I needed to
> wrap around the DFS output streams to let Avro take it as
> append-able?
> I don't think its possible for Avro to carry it since Avro
> (core) does
> not reverse-depend on Hadoop. Should we document it
> somewhere though?
> Do you have any ideas on the best place to do that?
>
> On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <[EMAIL PROTECTED]>
> wrote:
> > Thanks so much for the code -- it works great!
> >
> > Since it is a non-trivial amount of code required to
> > achieve append, I suggest attaching that code to AVRO-1035,
> > in the hopes that someone will come up with an interface
> > that requires just one line of user code to achieve append.
> >
> > --- On Wed, 2/6/13, Harsh J <[EMAIL PROTECTED]>
> wrote:
> >
> >> From: Harsh J <[EMAIL PROTECTED]>
> >> Subject: Re: Is it possible to append to an already existing avro file
> >> To: [EMAIL PROTECTED]
> >> Date: Wednesday, February 6, 2013, 11:17 AM
> >> Hey Michael,
> >>
> >> It does implement the regular Java OutputStream interface,
> >> as seen in
> >> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
> >>
> >> Here's a sample program that works on Hadoop 2.x in my
> >> tests:
> >> https://gist.github.com/QwertyManiac/4724582