|
|
-
Null values while loading
Arun Chandy Thomas 2011-05-25, 22:22
Hi ,
I am trying to use pig to aggregate data from an applications log lines.
Most of the data in the input file have the following format: A B C D E F
I am aggregating the data as follows:
A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F); D = group A by (A, B,C,D,E,F); E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit STORE E INTO '$in_dir._1' using PigStorage('\t');
In some cases i see the input lines are only : A B C D (E,F columns are missing) Would the pig script ignore such lines.
Thanks & Regards, Arun
-
Re: Null values while loading
Jonathan Coveney 2011-05-25, 22:33
I believe it should null them out.
2011/5/25 Arun Chandy Thomas <[EMAIL PROTECTED]>
> Hi , > > I am trying to use pig to aggregate data from an applications log lines. > > Most of the data in the input file have the following format: > A B C D E F > > I am aggregating the data as follows: > > A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F); > D = group A by (A, B,C,D,E,F); > E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit > STORE E INTO '$in_dir._1' using PigStorage('\t'); > > In some cases i see the input lines are only : A B C D > (E,F columns are missing) > Would the pig script ignore such lines. > > Thanks & Regards, > Arun >
-
Re: Null values while loading
Alan Gates 2011-05-25, 22:35
No, but you can make it by adding:
B = filter A by E is not null;
Alan.
On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:
> Hi , > > I am trying to use pig to aggregate data from an applications log > lines. > > Most of the data in the input file have the following format: > A B C D E F > > I am aggregating the data as follows: > > A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F); > D = group A by (A, B,C,D,E,F); > E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as > hit > STORE E INTO '$in_dir._1' using PigStorage('\t'); > > In some cases i see the input lines are only : A B C D (E,F columns > are missing) > Would the pig script ignore such lines. > > Thanks & Regards, > Arun
-
Re: Null values while loading
Arun Chandy Thomas 2011-05-25, 22:43
Thanks for the quick reply, but my question is a little different. I am sorry if i am not clear in my initial post.
I want the Pig script to consider E and F as null if the values are not present in the input line.
So basically all the lines should be loaded while firing : >> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
irrespective of whether any of the fields are null or not.
How can we achieve this?
Thanks & Regards, Arun On May 25, 2011, at 3:35 PM, Alan Gates wrote:
> No, but you can make it by adding: > > B = filter A by E is not null; > > Alan. > > On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote: > >> Hi , >> >> I am trying to use pig to aggregate data from an applications log lines. >> >> Most of the data in the input file have the following format: >> A B C D E F >> >> I am aggregating the data as follows: >> >> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F); >> D = group A by (A, B,C,D,E,F); >> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit >> STORE E INTO '$in_dir._1' using PigStorage('\t'); >> >> In some cases i see the input lines are only : A B C D (E,F columns are missing) >> Would the pig script ignore such lines. >> >> Thanks & Regards, >> Arun >
-
Re: Null values while loading
Sven Krasser 2011-05-25, 23:24
Are the tabs for these columns still there? In that case, there should be an empty string in there. Something like this should work then: Y = foreach X generate (A == '' ? null : A), (B == '' ? null : B), ... Otherwise, you could load the full line using TextLoader and then use STRSPLIT on it to extract your columns. That allows you to check if E and F are present. Best, -Sven On Wed, May 25, 2011 at 3:43 PM, Arun Chandy Thomas <[EMAIL PROTECTED]> wrote: > Thanks for the quick reply, but my question is a little different. > I am sorry if i am not clear in my initial post. > > I want the Pig script to consider E and F as null if the values are not present in the input line. > > So basically all the lines should be loaded while firing : >>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F); > > irrespective of whether any of the fields are null or not. > > How can we achieve this? > > Thanks & Regards, > Arun > On May 25, 2011, at 3:35 PM, Alan Gates wrote: > >> No, but you can make it by adding: >> >> B = filter A by E is not null; >> >> Alan. >> >> On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote: >> >>> Hi , >>> >>> I am trying to use pig to aggregate data from an applications log lines. >>> >>> Most of the data in the input file have the following format: >>> A B C D E F >>> >>> I am aggregating the data as follows: >>> >>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F); >>> D = group A by (A, B,C,D,E,F); >>> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit >>> STORE E INTO '$in_dir._1' using PigStorage('\t'); >>> >>> In some cases i see the input lines are only : A B C D (E,F columns are missing) >>> Would the pig script ignore such lines. >>> >>> Thanks & Regards, >>> Arun >> > > -- http://sites.google.com/site/krasser/
-
RE: Null values while loading
Olga Natkovich 2011-05-25, 23:57
This will happen with Pig 0.9. You can make it happen with Pig 0.8 if you provide type information in the schema of the load statement.
Olga
-----Original Message----- From: Arun Chandy Thomas [mailto:[EMAIL PROTECTED]] Sent: Wednesday, May 25, 2011 3:43 PM To: [EMAIL PROTECTED] Subject: Re: Null values while loading
Thanks for the quick reply, but my question is a little different. I am sorry if i am not clear in my initial post.
I want the Pig script to consider E and F as null if the values are not present in the input line.
So basically all the lines should be loaded while firing : >> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
irrespective of whether any of the fields are null or not.
How can we achieve this?
Thanks & Regards, Arun On May 25, 2011, at 3:35 PM, Alan Gates wrote:
> No, but you can make it by adding: > > B = filter A by E is not null; > > Alan. > > On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote: > >> Hi , >> >> I am trying to use pig to aggregate data from an applications log lines. >> >> Most of the data in the input file have the following format: >> A B C D E F >> >> I am aggregating the data as follows: >> >> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F); >> D = group A by (A, B,C,D,E,F); >> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit >> STORE E INTO '$in_dir._1' using PigStorage('\t'); >> >> In some cases i see the input lines are only : A B C D (E,F columns are missing) >> Would the pig script ignore such lines. >> >> Thanks & Regards, >> Arun >
|
|