Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Dealing with changing file format

Copy link to this message
Re: Dealing with changing file format
On Mon, Jul 2, 2012 at 7:10 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> In addition to what Robert says, using a schema-based approach such as
> Apache Avro can also help here. The schemas in Avro can evolve over
> time if done right, while not breaking old readers.

Thanks! Is there a good example of this that I can look at?

> On Tue, Jul 3, 2012 at 2:47 AM, Robert Evans <[EMAIL PROTECTED]> wrote:
> > There are several different ways.  One of the ways is to use something
> > like Hcatalog to track the format and location of the dataset.  This may
> > be overkill for your problem, but it will grow with you.  Another is to
> > store the scheme with the data when it is written out.  Your code may
> need
> > to the dynamically adjust to when the field is there and when it is not.
> >
> > --Bobby Evans
> >
> > On 7/2/12 4:09 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:
> >
> >>I am wondering what's the right way to go about designing reading input
> >>and
> >>output where file format may change over period. For instance we might
> >>start with "field1,field2,field3" but at some point we add new field4 in
> >>the input. What's the best way to deal with such scenarios? Keep a
> catalog
> >>of changes that timestamped?
> >
> --
> Harsh J