Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Re: Is it possible to append to an already existing avro file


Copy link to this message
-
Re: Is it possible to append to an already existing avro file
I don't believe a Hadoop FileSystem is a Java OutputStream?

--- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]> wrote:

> From: Doug Cutting <[EMAIL PROTECTED]>
> Subject: Re: Is it possible to append to an already existing avro file
> To: [EMAIL PROTECTED]
> Date: Tuesday, February 5, 2013, 5:27 PM
> It will work on an OutputStream that
> supports append.
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> java.io.OutputStream)
>
> So it depends on how well HDFS implements
> FileSystem#append(), not on
> any changes in Avro.
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>
> I have no recent personal experience with append in
> HDFS.  Does anyone
> else here?
>
> Doug
>
> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <[EMAIL PROTECTED]>
> wrote:
> > My understanding is that will append to a file on the
> local filesystem, but not to a file on HDFS.
> >
> > --- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]>
> wrote:
> >
> >> From: Doug Cutting <[EMAIL PROTECTED]>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: [EMAIL PROTECTED]
> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> The Jira is:
> >>
> >> https://issues.apache.org/jira/browse/AVRO-1035
> >>
> >> It is possible to append to an existing Avro file:
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >>
> >> Should we close that issue as "fixed"?
> >>
> >> Doug
> >>
> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
> <[EMAIL PROTECTED]>
> >> wrote:
> >> > Was a JIRA ticket ever created regarding
> appending to
> >> an existing Avro file on HDFS?
> >> >
> >> > What is the status of such a capability, a
> year out
> >> from when the issue below was raised?
> >> >
> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> "Vyacheslav
> >> Zholudev" <[EMAIL PROTECTED]>
> >> wrote:
> >> >
> >> >> Thanks for your reply, I suspected this.
> >> >>
> >> >> I will create a JIRA ticket.
> >> >>
> >> >> Vyacheslav
> >> >>
> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
> wrote:
> >> >>
> >> >>>
> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
> Zholudev"
> >> <[EMAIL PROTECTED]>
> >> >>> wrote:
> >> >>>
> >> >>>> Yep, I saw that method as well as
> the
> >> stackoverflow post. However, I'm
> >> >>>> interested how to append to a file
> on the
> >> arbitrary file system, not
> >> >>>> only on the local one.
> >> >>>>
> >> >>>> I want to get an OutputStream
> based on the
> >> Path and the FileSystem
> >> >>>> implementation and then pass it
> for
> >> appending to avro methods.
> >> >>>>
> >> >>>> Is that possible?
> >> >>>
> >> >>> It is not possible without modifying
> >> DataFileWriter. Please open a JIRA
> >> >>> ticket.
> >> >>>
> >> >>> It could not simply append to an
> OutputStream,
> >> since it must either:
> >> >>> * Seek to the start to validate the
> schemas
> >> match and find the sync
> >> >>> marker, or
> >> >>> * Trust that the schemas match and
> find the
> >> sync marker from the last
> >> >>> block
> >> >>>
> >> >>> DataFileWriter cannot refer to Hadoop
> classes
> >> such as FileSystem, but we
> >> >>> could add something to the mapred
> module that
> >> takes a Path and
> >> >>> FileSystem and returns something that
> >> implemements an interface that
> >> >>> DataFileWriter can append to. 
> This would
> >> be something that is both a
> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >>> and an OutputStream, or has both an
> InputStream
> >> from the start of the
> >> >>> existing file and an OutputStream at
> the end.
> >> >>>
> >> >>>> Thanks,
> >> >>>> Vyacheslav
> >> >>>>
> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
> J
> >> wrote:
> >> >>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> Use the appendTo feature of
> the
> >> DataFileWriter. See
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB