Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Null values while loading


+
Arun Chandy Thomas 2011-05-25, 22:22
+
Jonathan Coveney 2011-05-25, 22:33
+
Alan Gates 2011-05-25, 22:35
+
Arun Chandy Thomas 2011-05-25, 22:43
+
Sven Krasser 2011-05-25, 23:24
Copy link to this message
-
RE: Null values while loading
This will happen with Pig 0.9. You can make it happen with Pig 0.8 if you provide type information in the schema of the load statement.

Olga

-----Original Message-----
From: Arun Chandy Thomas [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 25, 2011 3:43 PM
To: [EMAIL PROTECTED]
Subject: Re: Null values while loading

Thanks for the quick reply, but my question is a little different.
I am sorry if i am not clear in my initial post.

I want the Pig script to consider E and F as null if the values are not present in the input line.

So basically all the lines should be loaded while firing :
>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);

irrespective of whether any of the fields are null or not.

How can we achieve this?

Thanks & Regards,
Arun
On May 25, 2011, at 3:35 PM, Alan Gates wrote:

> No, but you can make it by adding:
>
> B = filter A by E is not null;
>
> Alan.
>
> On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:
>
>> Hi ,
>>
>> I am trying to use pig to aggregate data from an applications log lines.
>>
>> Most of the data in the input file have the following format:
>> A B C D E F
>>
>> I am aggregating the data as follows:
>>
>> A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
>> D = group A by (A, B,C,D,E,F);
>> E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
>> STORE E INTO '$in_dir._1' using PigStorage('\t');
>>
>> In some cases i see the input lines are only : A B C D  (E,F columns are missing)
>> Would the pig script ignore such lines.
>>
>> Thanks & Regards,
>> Arun
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB