|
|
-
Sequence File processing
Srini 2012-12-24, 05:24
Hi ,
I have used SequeceFileLoader for loading sequence file.
A= load 'part-m-0000' using SequenceFileLoader() as (key:long,value:chararray)
"value" is the chararray which consists of 10 fields which are separated by delimiter ( "|" here ). How do I create schema here so that I can make further analysis with these fields (such as filter, group )
Any help is appreciated.
Thanks, Srini
+
Srini 2012-12-24, 05:24
-
Re: Sequence File processing
Cheolsoo Park 2012-12-24, 21:37
Hi Srini,
You can use STRSPLIT to split your "value" chararray and define schema in a FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"),
A= LOAD 'part-m-0000' USING SequenceFileLoader() AS (key:long,value:chararray); B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int, j:int, k:int); DESCRIBE B; DUMP B;
This will return:
B: {key: chararray,i: int,j: int,k: int} (k,1,2,3)
Thanks, Cheolsoo On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote:
> Hi , > > I have used SequeceFileLoader for loading sequence file. > > A= load 'part-m-0000' using SequenceFileLoader() as > (key:long,value:chararray) > > "value" is the chararray which consists of 10 fields which are separated > by delimiter ( "|" here ). How do I create schema here so that I can make > further analysis with these fields (such as filter, group ) > > Any help is appreciated. > > Thanks, > Srini >
+
Cheolsoo Park 2012-12-24, 21:37
-
Re: Sequence File processing
Srini 2012-12-25, 06:35
Thanks Cheolsoo.
On Mon, Dec 24, 2012 at 1:37 PM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:
> Hi Srini, > > You can use STRSPLIT to split your "value" chararray and define schema in a > FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"), > > A= LOAD 'part-m-0000' USING SequenceFileLoader() AS > (key:long,value:chararray); > B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int, > j:int, k:int); > DESCRIBE B; > DUMP B; > > This will return: > > B: {key: chararray,i: int,j: int,k: int} > (k,1,2,3) > > Thanks, > Cheolsoo > > > On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote: > > > Hi , > > > > I have used SequeceFileLoader for loading sequence file. > > > > A= load 'part-m-0000' using SequenceFileLoader() as > > (key:long,value:chararray) > > > > "value" is the chararray which consists of 10 fields which are separated > > by delimiter ( "|" here ). How do I create schema here so that I can make > > further analysis with these fields (such as filter, group ) > > > > Any help is appreciated. > > > > Thanks, > > Srini > > >
-- Regards, Srinivas [EMAIL PROTECTED]
+
Srini 2012-12-25, 06:35
-
Re: Sequence File processing
Mohammad Tariq 2012-12-24, 21:39
+1 Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/On Tue, Dec 25, 2012 at 3:07 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote: > Hi Srini, > > You can use STRSPLIT to split your "value" chararray and define schema in a > FOREACH. For example, if the "value" consists of 3 integers (i.e. "1|2|3"), > > A= LOAD 'part-m-0000' USING SequenceFileLoader() AS > (key:long,value:chararray); > B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int, > j:int, k:int); > DESCRIBE B; > DUMP B; > > This will return: > > B: {key: chararray,i: int,j: int,k: int} > (k,1,2,3) > > Thanks, > Cheolsoo > > > On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote: > > > Hi , > > > > I have used SequeceFileLoader for loading sequence file. > > > > A= load 'part-m-0000' using SequenceFileLoader() as > > (key:long,value:chararray) > > > > "value" is the chararray which consists of 10 fields which are separated > > by delimiter ( "|" here ). How do I create schema here so that I can make > > further analysis with these fields (such as filter, group ) > > > > Any help is appreciated. > > > > Thanks, > > Srini > > >
+
Mohammad Tariq 2012-12-24, 21:39
-
Re: Sequence File processing
Kshiva Kps 2012-12-25, 05:42
Hi, Is there any PIG editors and where we can write 100 to 150 pig scripts I'm believing is not possible to do in CLI mode . Like IDE for JAVA /TOAD for SQL pls advice , many thanks Thanks On Tue, Dec 25, 2012 at 3:09 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > +1 > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/> > > On Tue, Dec 25, 2012 at 3:07 AM, Cheolsoo Park <[EMAIL PROTECTED] > >wrote: > > > Hi Srini, > > > > You can use STRSPLIT to split your "value" chararray and define schema > in a > > FOREACH. For example, if the "value" consists of 3 integers (i.e. > "1|2|3"), > > > > A= LOAD 'part-m-0000' USING SequenceFileLoader() AS > > (key:long,value:chararray); > > B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int, > > j:int, k:int); > > DESCRIBE B; > > DUMP B; > > > > This will return: > > > > B: {key: chararray,i: int,j: int,k: int} > > (k,1,2,3) > > > > Thanks, > > Cheolsoo > > > > > > On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote: > > > > > Hi , > > > > > > I have used SequeceFileLoader for loading sequence file. > > > > > > A= load 'part-m-0000' using SequenceFileLoader() as > > > (key:long,value:chararray) > > > > > > "value" is the chararray which consists of 10 fields which are > separated > > > by delimiter ( "|" here ). How do I create schema here so that I can > make > > > further analysis with these fields (such as filter, group ) > > > > > > Any help is appreciated. > > > > > > Thanks, > > > Srini > > > > > >
+
Kshiva Kps 2012-12-25, 05:42
-
Re: Sequence File processing
Dmitriy Ryaboy 2013-01-11, 03:37
Please see the list of editor plugins in https://cwiki.apache.org/confluence/display/PIG/PigToolsD On Mon, Dec 24, 2012 at 9:42 PM, Kshiva Kps <[EMAIL PROTECTED]> wrote: > Hi, > > Is there any PIG editors and where we can write 100 to 150 pig scripts > I'm believing is not possible to do in CLI mode . > Like IDE for JAVA /TOAD for SQL pls advice , many thanks > > > Thanks > > > On Tue, Dec 25, 2012 at 3:09 AM, Mohammad Tariq <[EMAIL PROTECTED]> > wrote: > > > +1 > > > > Best Regards, > > Tariq > > +91-9741563634 > > https://mtariq.jux.com/> > > > > > On Tue, Dec 25, 2012 at 3:07 AM, Cheolsoo Park <[EMAIL PROTECTED] > > >wrote: > > > > > Hi Srini, > > > > > > You can use STRSPLIT to split your "value" chararray and define schema > > in a > > > FOREACH. For example, if the "value" consists of 3 integers (i.e. > > "1|2|3"), > > > > > > A= LOAD 'part-m-0000' USING SequenceFileLoader() AS > > > (key:long,value:chararray); > > > B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int, > > > j:int, k:int); > > > DESCRIBE B; > > > DUMP B; > > > > > > This will return: > > > > > > B: {key: chararray,i: int,j: int,k: int} > > > (k,1,2,3) > > > > > > Thanks, > > > Cheolsoo > > > > > > > > > On Sun, Dec 23, 2012 at 9:24 PM, Srini <[EMAIL PROTECTED]> wrote: > > > > > > > Hi , > > > > > > > > I have used SequeceFileLoader for loading sequence file. > > > > > > > > A= load 'part-m-0000' using SequenceFileLoader() as > > > > (key:long,value:chararray) > > > > > > > > "value" is the chararray which consists of 10 fields which are > > separated > > > > by delimiter ( "|" here ). How do I create schema here so that I can > > make > > > > further analysis with these fields (such as filter, group ) > > > > > > > > Any help is appreciated. > > > > > > > > Thanks, > > > > Srini > > > > > > > > > >
+
Dmitriy Ryaboy 2013-01-11, 03:37
|
|