Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Replace null with string


Copy link to this message
-
Re: Replace null with string
fixed = FOREACH my_relation GENERATE (my_field IS NOT NULL ? my_field
: 'other') as my_field, *;

Russell Jurney
twitter.com/rjurney
[EMAIL PROTECTED]
datasyndrome.com

On Jun 8, 2012, at 3:01 PM, Dragan Nedeljkovic <[EMAIL PROTECTED]> wrote:

> You can use an UDF like the one bellow to deal with  the NULLs.
>
> register 'mypiggybank.jar';
> define Nvl piggybank.Nvl();
>
> input_lines = LOAD 'test_Nvl.in' AS (line:chararray);
> describe input_lines;
> dump input_lines;
>
> new_list = FOREACH input_lines GENERATE Nvl(line, 'n/a');
> describe new_list;
> dump new_list;
>
> -- eof
>
> package piggybank;
>
> import java.io.IOException;
>
> import org.apache.pig.EvalFunc;
> import org.apache.pig.data.Tuple;
>
> public class Nvl
>    extends EvalFunc<String>
> {
>    public String exec(Tuple inputA)
>       throws IOException
>    {
>       try
>       {
>          String inputVal = (String)inputA.get(0);
>          String defaultVal = (String)inputA.get(1);
>
>          if(inputVal == null || inputVal.length() == 0)
>          {
>             return defaultVal;
>          }
>
>          return inputVal;
>       }
>       catch(Exception e)
>       {
>          // Throwing an exception will cause the task to fail.
>          throw new IOException("Something bad happened!", e);
>       }
>    }
> }
> // eof
>
>
>
>
>> ________________________________
>> From: Mario Lassnig <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Thursday, June 7, 2012 7:37:36 AM
>> Subject: Replace null with string
>>
>> Hello,
>>
>> I'm having a lot of null entries in my data. Due to later processing it would be very helpful if I could set a default value for null to be the string "other". I couldn't find a way to do this (version 0.8.1-cdh3u4)
>>
>> Also, I have some variables in my GENERATE statements that can potentially return null, and I would need something similar to the SQL DECODE function to get the "other" string instead of null.
>>
>> Example:
>>
>> tmp = FOREACH dump GENERATE site, REGEX_EXTRACT(name, '^(?:([^.]+)\\.?){1}', 1) AS project, ((ami MATCHES '.*datatype.*') ? REGEX_EXTRACT(name, '^(?:([^.]+)\\.?){5}', 1) : 'other') AS datatype, ami, duid, nbfiles, length, rnbfiles, rlength, name;
>>
>> Here: 'site' and 'datatype' could return an empty string (which is valid) and is interpreted as null, but should be "other" instead.
>>
>> Thanks a lot,
>> Mario
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB