Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Sequence File processing

Copy link to this message
Re: Sequence File processing
Cheolsoo Park 2012-12-24, 21:37
Hi Srini,

You can use STRSPLIT to split your "value" chararray and define schema in a
FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"),

A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
j:int, k:int);

This will return:

B: {key: chararray,i: int,j: int,k: int}

On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote:

> Hi ,
> I have used SequeceFileLoader for loading sequence file.
> A= load 'part-m-0000' using SequenceFileLoader() as
> (key:long,value:chararray)
> "value" is the  chararray which consists of 10 fields which are separated
> by delimiter ( "|" here ). How do I create schema here so that I can make
> further analysis with these fields (such as filter, group )
> Any help is appreciated.
> Thanks,
> Srini