Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - problem filtering null values with pig


+
Arian Pasquali 2012-10-31, 20:06
+
Cheolsoo Park 2012-10-31, 21:25
+
Arian Pasquali 2012-10-31, 21:50
+
Arian Pasquali 2012-10-31, 21:54
+
Cheolsoo Park 2012-10-31, 23:03
Copy link to this message
-
Re: problem filtering null values with pig
Arian Pasquali 2012-11-01, 19:47
You are right Cheolsoo,
Indeed, it doesn't make any sense to write an UDF to compare datatypes. I
know its possible, but doesn't sound the right way.
Maybe it can be a bug at the JsonLoader I'm using
https://github.com/mmay/PigJsonLoader/blob/master/JsonLoader.java

I will share with u the script and the data in a few.

tks for the hints.

Arian Rodrigo Pasquali
FEUP, SAPO Labs
http://www.arianpasquali.com
twitter @arianpasquali

2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]>

> Hi,
>
> > what's be the best way to filter only the valid rows, since some of
> them are string and others map?
>
> This shouldn't happen. The data type is defined per column, so it should be
> either string or map for all rows. If that's not the case, it should be a
> bug.
>
> > can create an expression to compare datatypes? is it possible?
>
> Technically, you should be able to write a UDF that checks type. But I am
> more interested in knowing why you're running into this problem. Can you
> please share your script and sample data? I'd like to reproduce it.
>
> Thanks,
> Cheolsoo
>
> On Wed, Oct 31, 2012 at 2:54 PM, Arian Pasquali <[EMAIL PROTECTED]
> >wrote:
>
> > can create an expression to compare datatypes?
> > is it possible?
> >
> > ArianP
> >
> > 2012/10/31 Arian Pasquali <[EMAIL PROTECTED]>
> >
> > > you are right, it doesn't seam like a null value.
> > > it looks like a chararray. But the expression causes error when
> comparing
> > > a string with ([longitude#-9.15199849,latitude#38.71179122])
> > >
> > > geoinfo_no_nulls = FILTER geoinfo BY $0!='null'
> > >
> > > I get
> > > ERROR 2997: Unable to recreate exception from backed error:
> > > org.apache.pig.backend.executionengine.ExecException: ERROR 1071:
> Cannot
> > > convert a map to a String
> > >
> > > what's be the best way to filter only the valid rows, since some of
> them
> > > are string and others map?
> > >
> > > Arian
> > >
> > >
> > >
> > > 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]>
> > >
> > >> Hi,
> > >>
> > >> I am not sure what's the problem because I can't reproduce it. To me,
> > null
> > >> values are printed as an empty "( )" not "(null)", so it doesn't seem
> > like
> > >> null.
> > >>
> > >> I am wondering whether OpenJDK is the problem. Can you try Oracle
> > HotSpot
> > >> JDK 1.6 and see that fixes it?
> > >>
> > >> Thanks,
> > >> Cheolsoo
> > >>
> > >> On Wed, Oct 31, 2012 at 1:06 PM, Arian Pasquali <
> > [EMAIL PROTECTED]
> > >> >wrote:
> > >>
> > >> > hey people
> > >> > I'm having some troubles with a silly task, I can´t find a way to
> > filter
> > >> > null values from my rows. This is the result when I dump the object
> > >> > geoinfo:
> > >> >
> > >> > DUMP geoinfo;
> > >> > ([longitude#70.95853,latitude#30.9773])
> > >> > ([longitude#-9.37944507,latitude#38.91780853])
> > >> > (null)
> > >> > (null)
> > >> > (null)
> > >> > ([longitude#-92.64416,latitude#16.73326])
> > >> > (null)
> > >> > (null)
> > >> > ([longitude#-9.15199849,latitude#38.71179122])
> > >> > ([longitude#-9.15210796,latitude#38.71195131])
> > >> >
> > >> > and here is the description
> > >> >
> > >> > DESCRIBE geoinfo;
> > >> > geoinfo: {geoLocation: bytearray}
> > >> >
> > >> > What I'm trying to do is to filter null values like this:
> > >> >
> > >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null;
> > >> >
> > >> > but the result remains the same. nothing is filtered.
> > >> >
> > >> > I also tried something like this
> > >> >
> > >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null';
> > >> >
> > >> >  and I got an error
> > >> >
> > >> > org.apache.pig.backend.executionengine.ExecException: ERROR 1071:
> > Cannot
> > >> > convert a map to a String
> > >> >
> > >> > What am I doing wrong here?
> > >> >
> > >> > env details,
> > >> >
> > >> > Ubuntu 12.04.1 LTS,
> > >> > hadoop-1.0.3
> > >> > pig 0.9.3 version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012,
> > >> 19:04:03
> > >> > java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6
+
Arian Pasquali 2012-11-17, 05:01