Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Dealing with changing file format


Copy link to this message
-
Re: Dealing with changing file format
Mohit Anchlia 2012-07-03, 04:40
On Mon, Jul 2, 2012 at 7:10 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> In addition to what Robert says, using a schema-based approach such as
> Apache Avro can also help here. The schemas in Avro can evolve over
> time if done right, while not breaking old readers.
>

Thanks! Is there a good example of this that I can look at?

>
> On Tue, Jul 3, 2012 at 2:47 AM, Robert Evans <[EMAIL PROTECTED]> wrote:
> > There are several different ways.  One of the ways is to use something
> > like Hcatalog to track the format and location of the dataset.  This may
> > be overkill for your problem, but it will grow with you.  Another is to
> > store the scheme with the data when it is written out.  Your code may
> need
> > to the dynamically adjust to when the field is there and when it is not.
> >
> > --Bobby Evans
> >
> > On 7/2/12 4:09 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:
> >
> >>I am wondering what's the right way to go about designing reading input
> >>and
> >>output where file format may change over period. For instance we might
> >>start with "field1,field2,field3" but at some point we add new field4 in
> >>the input. What's the best way to deal with such scenarios? Keep a
> catalog
> >>of changes that timestamped?
> >
>
>
>
> --
> Harsh J
>