Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Attaching column headers to the tuple


Copy link to this message
-
Re: Attaching column headers to the tuple
Cheolsoo Park 2013-06-09, 19:39
Hi Siddhi,

Please take a look at CSVStorage in trunk:
https://issues.apache.org/jira/browse/PIG-3141.

You can write the header using the WRITE_OUTPUT_HEADER option. Despite its
name, you can also specify a non-comma delimiter. Here is the syntax:

 STORE x INTO '<destFileName>'
         USING org.apache.pig.piggybank.storage.CSVExcelStorage(
              [DELIMITER[,
                  {YES_MULTILINE | NO_MULTILINE}[,
                      {UNIX | WINDOWS | NOCHANGE}[,
                          {READ_INPUT_HEADER, SKIP_INPUT_HEADER,
WRITE_OUTPUT_HEADER, SKIP_OUTPUT_HEADER}]]]]
         );

Since this is only in trunk, you need to backport it by yourself to the
version of Pig that you're using.

Thanks,
Cheolsoo
On Tue, Jun 4, 2013 at 8:45 PM, Siddhi Borkar <
[EMAIL PROTECTED]> wrote:

> I'm writing a pig script similar to:
>
> A = load 'data' using
> org.apache.pig.piggybank.storage.XMLLoader('response') as (line:chararray);
> B = foreach A GENERATE FLATTEN(Parser(line));
> store B into my_data using PigStorage('\t');
>
> This script basically reads a file which contains xml's dumped in it. The
> second line in a pig script calls the java udf which parses the xml.
>
> The Parser UDF returns a data bag with multiple tuples
> This outputs:
>
> (1            91705    rondo music guitar)
> (3            96629    award music guitar)
>
> I'd like to add a header row to the output file:
>
> (Id          Form     Query)
> (1            91705    rondo music guitar)
> (3            96629    award music guitar)
>
> Any ideas?
>
>
>
>
> DISCLAIMER
> =========> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>