Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Attaching column headers to the tuple


Copy link to this message
-
Re: Attaching column headers to the tuple
Hi Siddhi,

Please take a look at CSVStorage in trunk:
https://issues.apache.org/jira/browse/PIG-3141.

You can write the header using the WRITE_OUTPUT_HEADER option. Despite its
name, you can also specify a non-comma delimiter. Here is the syntax:

 STORE x INTO '<destFileName>'
         USING org.apache.pig.piggybank.storage.CSVExcelStorage(
              [DELIMITER[,
                  {YES_MULTILINE | NO_MULTILINE}[,
                      {UNIX | WINDOWS | NOCHANGE}[,
                          {READ_INPUT_HEADER, SKIP_INPUT_HEADER,
WRITE_OUTPUT_HEADER, SKIP_OUTPUT_HEADER}]]]]
         );

Since this is only in trunk, you need to backport it by yourself to the
version of Pig that you're using.

Thanks,
Cheolsoo
On Tue, Jun 4, 2013 at 8:45 PM, Siddhi Borkar <
[EMAIL PROTECTED]> wrote:

> I'm writing a pig script similar to:
>
> A = load 'data' using
> org.apache.pig.piggybank.storage.XMLLoader('response') as (line:chararray);
> B = foreach A GENERATE FLATTEN(Parser(line));
> store B into my_data using PigStorage('\t');
>
> This script basically reads a file which contains xml's dumped in it. The
> second line in a pig script calls the java udf which parses the xml.
>
> The Parser UDF returns a data bag with multiple tuples
> This outputs:
>
> (1            91705    rondo music guitar)
> (3            96629    award music guitar)
>
> I'd like to add a header row to the output file:
>
> (Id          Form     Query)
> (1            91705    rondo music guitar)
> (3            96629    award music guitar)
>
> Any ideas?
>
>
>
>
> DISCLAIMER
> =========> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB