Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Re: Is it possible to append to an already existing avro file


+
Michael Malak 2013-02-01, 19:32
+
Doug Cutting 2013-02-06, 00:08
Copy link to this message
-
Re: Is it possible to append to an already existing avro file
My understanding is that will append to a file on the local filesystem, but not to a file on HDFS.

--- On Tue, 2/5/13, Doug Cutting <[EMAIL PROTECTED]> wrote:

> From: Doug Cutting <[EMAIL PROTECTED]>
> Subject: Re: Is it possible to append to an already existing avro file
> To: [EMAIL PROTECTED]
> Date: Tuesday, February 5, 2013, 5:08 PM
> The Jira is:
>
> https://issues.apache.org/jira/browse/AVRO-1035
>
> It is possible to append to an existing Avro file:
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>
> Should we close that issue as "fixed"?
>
> Doug
>
> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <[EMAIL PROTECTED]>
> wrote:
> > Was a JIRA ticket ever created regarding appending to
> an existing Avro file on HDFS?
> >
> > What is the status of such a capability, a year out
> from when the issue below was raised?
> >
> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav
> Zholudev" <[EMAIL PROTECTED]>
> wrote:
> >
> >> Thanks for your reply, I suspected this.
> >>
> >> I will create a JIRA ticket.
> >>
> >> Vyacheslav
> >>
> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
> >>
> >>>
> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev"
> <[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>>> Yep, I saw that method as well as the
> stackoverflow post. However, I'm
> >>>> interested how to append to a file on the
> arbitrary file system, not
> >>>> only on the local one.
> >>>>
> >>>> I want to get an OutputStream based on the
> Path and the FileSystem
> >>>> implementation and then pass it for
> appending to avro methods.
> >>>>
> >>>> Is that possible?
> >>>
> >>> It is not possible without modifying
> DataFileWriter. Please open a JIRA
> >>> ticket.
> >>>
> >>> It could not simply append to an OutputStream,
> since it must either:
> >>> * Seek to the start to validate the schemas
> match and find the sync
> >>> marker, or
> >>> * Trust that the schemas match and find the
> sync marker from the last
> >>> block
> >>>
> >>> DataFileWriter cannot refer to Hadoop classes
> such as FileSystem, but we
> >>> could add something to the mapred module that
> takes a Path and
> >>> FileSystem and returns something that
> implemements an interface that
> >>> DataFileWriter can append to.  This would
> be something that is both a
> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >>> and an OutputStream, or has both an InputStream
> from the start of the
> >>> existing file and an OutputStream at the end.
> >>>
> >>>> Thanks,
> >>>> Vyacheslav
> >>>>
> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Use the appendTo feature of the
> DataFileWriter. See
> >>>>>
> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >>>>>
> >>>>> For a quick setup example, read also:
> >>>>>
> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >>>>>
> >>>>> On Tue, Feb 21, 2012 at 3:15 AM,
> Vyacheslav Zholudev
> >>>>> <[EMAIL PROTECTED]>
> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> is it possible to append to an
> already existing avro file when it was
> >>>>>> written and closed before?
> >>>>>>
> >>>>>> If I use
> >>>>>> outputStream > fs.append(avroFilePath);
> >>>>>>
> >>>>>> then later on I get:
> java.io.IOException: Invalid sync!
> >>>>>>
> >>>>>> Probably because the schema is
> written twice and some other issues.
> >>>>>>
> >>>>>> If I use outputStream > fs.create(avroFilePath); then the avro file
> >>>>>> gets
> >>>>>> overwritten.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vyacheslav
> >>>>>
> >>>>> --
> >>>>> Harsh J
> >>>>> Customer Ops. Engineer
> >>>>> Cloudera | http://tiny.cloudera.com/about
> >
>
> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <[EMAIL PROTECTED]>
> wrote:
> > Was a JIRA ticket ever created regarding appending to
> an existing Avro file on HDFS?
+
Doug Cutting 2013-02-06, 00:27
+
Michael Malak 2013-02-06, 03:30
+
Harsh J 2013-02-06, 18:17
+
Michael Malak 2013-02-07, 00:42
+
Harsh J 2013-02-07, 16:28
+
Doug Cutting 2013-02-07, 16:51
+
Harsh J 2013-02-07, 16:56
+
Michael Malak 2013-02-07, 16:42
+
Ken Krugler 2013-02-06, 18:03
+
TrevniUser 2013-07-08, 16:29
+
Doug Cutting 2013-07-09, 16:29
+
TrevniUser 2013-07-09, 17:24