Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Dealing with changing file format

Copy link to this message
Re: Dealing with changing file format
There are several different ways.  One of the ways is to use something
like Hcatalog to track the format and location of the dataset.  This may
be overkill for your problem, but it will grow with you.  Another is to
store the scheme with the data when it is written out.  Your code may need
to the dynamically adjust to when the field is there and when it is not.

--Bobby Evans

On 7/2/12 4:09 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:

>I am wondering what's the right way to go about designing reading input
>output where file format may change over period. For instance we might
>start with "field1,field2,field3" but at some point we add new field4 in
>the input. What's the best way to deal with such scenarios? Keep a catalog
>of changes that timestamped?