Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How do I read headers from first line into schema?


Copy link to this message
-
Re: How do I read headers from first line into schema?
Have you considered using a Pig schema?

On Jul 9, 2013, at 12:32 PM, "Kimmel, Chad" <[EMAIL PROTECTED]> wrote:

> Hi, what I am trying to do is read the headers from the first line as the field names into the schema. For instance, given the following tab deliminated file
>
> --samplefile.txt—
> Name  Job      Age
> Chad   Engineer          23
> Mike    Stats    34
> Chris    IT         25
>
> Instead of deleting the first line and loading in the field names using the AS function:
>
> rows = LOAD 'samplefile.txt’ USING PigStorage('\t') AS (Name:chararray,day,Job:chararray,Age:int);
>
> I would like to instead read it in as part of the PIG script directly.  The reason why this is important for my project is because each file being read in has field names which change (i.e. dynamic) for each file, and I need to keep a record of these unique field names.
>
> Does anyone know how to solve this problem?  I think the LoadMetaData might be useful, but I don’t know how to use it. Thanks!
>
> Chad
>
>
>
> Chad Kimmel Sr. Statistical Analyst | comScore, Inc. (NASDAQ:SCOR)
>
> o +1 (571) 306-6439 | [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
>
> ...........................................................................................................
>
> comScore Media Metrix® Multi-Platform: Audience Analytics for the Brave New Digital World
> www.comscore.com/multiplatform<http://www.comscore.com/multiplatform>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB