Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Sequence File processing


Copy link to this message
-
Re: Sequence File processing
Hi Srini,

You can use STRSPLIT to split your "value" chararray and define schema in a
FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"),

A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
(key:long,value:chararray);
B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
j:int, k:int);
DESCRIBE B;
DUMP B;

This will return:

B: {key: chararray,i: int,j: int,k: int}
(k,1,2,3)

Thanks,
Cheolsoo
On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote:

> Hi ,
>
> I have used SequeceFileLoader for loading sequence file.
>
> A= load 'part-m-0000' using SequenceFileLoader() as
> (key:long,value:chararray)
>
> "value" is the  chararray which consists of 10 fields which are separated
> by delimiter ( "|" here ). How do I create schema here so that I can make
> further analysis with these fields (such as filter, group )
>
> Any help is appreciated.
>
> Thanks,
> Srini
>