Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Null values while loading

Arun Chandy Thomas 2011-05-25, 22:22
Jonathan Coveney 2011-05-25, 22:33
Alan Gates 2011-05-25, 22:35
Arun Chandy Thomas 2011-05-25, 22:43
Sven Krasser 2011-05-25, 23:24
Copy link to this message
RE: Null values while loading
This will happen with Pig 0.9. You can make it happen with Pig 0.8 if you provide type information in the schema of the load statement.


-----Original Message-----
From: Arun Chandy Thomas [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 25, 2011 3:43 PM
Subject: Re: Null values while loading

Thanks for the quick reply, but my question is a little different.
I am sorry if i am not clear in my initial post.

I want the Pig script to consider E and F as null if the values are not present in the input line.

So basically all the lines should be loaded while firing :
>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);

irrespective of whether any of the fields are null or not.

How can we achieve this?

Thanks & Regards,
On May 25, 2011, at 3:35 PM, Alan Gates wrote:

> No, but you can make it by adding:
> B = filter A by E is not null;
> Alan.
> On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:
>> Hi ,
>> I am trying to use pig to aggregate data from an applications log lines.
>> Most of the data in the input file have the following format:
>> A B C D E F
>> I am aggregating the data as follows:
>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
>> D = group A by (A, B,C,D,E,F);
>> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
>> STORE E INTO '$in_dir._1' using PigStorage('\t');
>> In some cases i see the input lines are only : A B C D  (E,F columns are missing)
>> Would the pig script ignore such lines.
>> Thanks & Regards,
>> Arun