Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Sequence File processing


+
Srini 2012-12-24, 05:24
Copy link to this message
-
Re: Sequence File processing
Hi Srini,

You can use STRSPLIT to split your "value" chararray and define schema in a
FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"),

A= LOAD 'part-m-0000' USING SequenceFileLoader() AS
(key:long,value:chararray);
B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
j:int, k:int);
DESCRIBE B;
DUMP B;

This will return:

B: {key: chararray,i: int,j: int,k: int}
(k,1,2,3)

Thanks,
Cheolsoo
On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote:

> Hi ,
>
> I have used SequeceFileLoader for loading sequence file.
>
> A= load 'part-m-0000' using SequenceFileLoader() as
> (key:long,value:chararray)
>
> "value" is the  chararray which consists of 10 fields which are separated
> by delimiter ( "|" here ). How do I create schema here so that I can make
> further analysis with these fields (such as filter, group )
>
> Any help is appreciated.
>
> Thanks,
> Srini
>
+
Srini 2012-12-25, 06:35
+
Mohammad Tariq 2012-12-24, 21:39
+
Kshiva Kps 2012-12-25, 05:42
+
Dmitriy Ryaboy 2013-01-11, 03:37
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB