On Mon, Jul 2, 2012 at 7:10 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> In addition to what Robert says, using a schema-based approach such as
> Apache Avro can also help here. The schemas in Avro can evolve over
> time if done right, while not breaking old readers.
Thanks! Is there a good example of this that I can look at?
> On Tue, Jul 3, 2012 at 2:47 AM, Robert Evans <[EMAIL PROTECTED]> wrote:
> > There are several different ways. One of the ways is to use something
> > like Hcatalog to track the format and location of the dataset. This may
> > be overkill for your problem, but it will grow with you. Another is to
> > store the scheme with the data when it is written out. Your code may
> > to the dynamically adjust to when the field is there and when it is not.
> > --Bobby Evans
> > On 7/2/12 4:09 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:
> >>I am wondering what's the right way to go about designing reading input
> >>output where file format may change over period. For instance we might
> >>start with "field1,field2,field3" but at some point we add new field4 in
> >>the input. What's the best way to deal with such scenarios? Keep a
> >>of changes that timestamped?
> Harsh J