|
|
John Meek 2013-03-10, 02:57
hi,
I m trying to use the following statement in Pig to parse out my data.
B = FOREACH A GENERATE FLATTEN( REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY);
The input is basically a file with values in the following format: a02s6pq0s1t-dl 20130106-UX 32 johnm-dl 20130106-DX 32
I need the output to be 6 columns like below:
a02s6pq0s1t dl 20130106 U X 32 johnm dl 20130106 D X 32
Pig is giving me (). Please help. John M
Hi John, I ran these in pig 0.9.2 A = LOAD 'data' as line:chararray; B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY); dump B; gives me following (a02s6pq0s1t,dl,20130106,U,X,32) (johnm,dl,20130106,D,X,32) which version of pig you are running. -- Harsha On Saturday, March 9, 2013 at 6:57 PM, John Meek wrote:
> B = FOREACH A GENERATE FLATTEN( > REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY); >
John Meek 2013-03-10, 04:03
hi Harsha,
Running release 0.11.0. Thanks.
-----Original Message----- From: Harsha <[EMAIL PROTECTED]> To: user <[EMAIL PROTECTED]> Sent: Sat, Mar 9, 2013 10:40 pm Subject: Re: Pig Regex Help Hi John, I ran these in pig 0.9.2 A = LOAD 'data' as line:chararray; B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY); dump B; gives me following (a02s6pq0s1t,dl,20130106,U,X,32) (johnm,dl,20130106,D,X,32) which version of pig you are running. -- Harsha On Saturday, March 9, 2013 at 6:57 PM, John Meek wrote:
> B = FOREACH A GENERATE FLATTEN( > REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY); >
John Meek 2013-03-10, 14:38
Harsha, thanks for your response. I needed to use USING PigStorage(',' ) in my load statement. Works now.
-----Original Message----- From: Harsha <[EMAIL PROTECTED]> To: user <[EMAIL PROTECTED]> Sent: Sat, Mar 9, 2013 10:40 pm Subject: Re: Pig Regex Help Hi John, I ran these in pig 0.9.2 A = LOAD 'data' as line:chararray; B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY); dump B; gives me following (a02s6pq0s1t,dl,20130106,U,X,32) (johnm,dl,20130106,D,X,32) which version of pig you are running. -- Harsha On Saturday, March 9, 2013 at 6:57 PM, John Meek wrote:
> B = FOREACH A GENERATE FLATTEN( > REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS (Field1:CHARARRAY,Field2:CHARARRAY,Date:CHARARRAY,Field3:CHARARRAY,Field4:CHARARRAY,Field5:CHARARRAY); >
|
|