Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - possible to infer schema from TSV header?


Copy link to this message
-
Re: possible to infer schema from TSV header?
Mason 2013-01-15, 22:27
Actually, I'll probably just end up computing positions to use, rather
than pasting in a schema, but the general point is that I'd love to do
it some other way, because little hacks like these make my data
pipeline feel fragile.

I'm willing to write some Java if anyone could point me in the write direction.

-Mason

On Tue, Jan 15, 2013 at 2:23 PM, Mason <[EMAIL PROTECTED]> wrote:
> I have TSVs with a lot of columns, and I would like to address them by
> name, as specified in the header line (first row), within Pig.
>
> The best I can come up with a.t.m is to write a script that strips the
> header line from the file and converts it to the form (col1:string,
> col2:string, ...), then plug that schema string into the AS portion of
> my LOAD statement. Then I'll project columns I want and manually
> typecast them.
>
> Is there a better, simple way?
>
> -Mason