Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> possible to infer schema from TSV header?


Copy link to this message
-
Re: possible to infer schema from TSV header?
Actually, I'll probably just end up computing positions to use, rather
than pasting in a schema, but the general point is that I'd love to do
it some other way, because little hacks like these make my data
pipeline feel fragile.

I'm willing to write some Java if anyone could point me in the write direction.

-Mason

On Tue, Jan 15, 2013 at 2:23 PM, Mason <[EMAIL PROTECTED]> wrote:
> I have TSVs with a lot of columns, and I would like to address them by
> name, as specified in the header line (first row), within Pig.
>
> The best I can come up with a.t.m is to write a script that strips the
> header line from the file and converts it to the form (col1:string,
> col2:string, ...), then plug that schema string into the AS portion of
> my LOAD statement. Then I'll project columns I want and manually
> typecast them.
>
> Is there a better, simple way?
>
> -Mason
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB