Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Sequence File processing


Copy link to this message
-
Re: Sequence File processing
Hi Srini,

You can use STRSPLIT to split your "value" chararray and define schema in a
FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"),

A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
(key:long,value:chararray);
B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
j:int, k:int);
DESCRIBE B;
DUMP B;

This will return:

B: {key: chararray,i: int,j: int,k: int}
(k,1,2,3)

Thanks,
Cheolsoo
On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote:

> Hi ,
>
> I have used SequeceFileLoader for loading sequence file.
>
> A= load 'part-m-0000' using SequenceFileLoader() as
> (key:long,value:chararray)
>
> "value" is the  chararray which consists of 10 fields which are separated
> by delimiter ( "|" here ). How do I create schema here so that I can make
> further analysis with these fields (such as filter, group )
>
> Any help is appreciated.
>
> Thanks,
> Srini
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB