Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> problem filtering null values with pig


+
Arian Pasquali 2012-10-31, 20:06
+
Cheolsoo Park 2012-10-31, 21:25
+
Arian Pasquali 2012-10-31, 21:50
+
Arian Pasquali 2012-10-31, 21:54
Copy link to this message
-
Re: problem filtering null values with pig
Hi,

> what's be the best way to filter only the valid rows, since some of
them are string and others map?

This shouldn't happen. The data type is defined per column, so it should be
either string or map for all rows. If that's not the case, it should be a
bug.

> can create an expression to compare datatypes? is it possible?

Technically, you should be able to write a UDF that checks type. But I am
more interested in knowing why you're running into this problem. Can you
please share your script and sample data? I'd like to reproduce it.

Thanks,
Cheolsoo

On Wed, Oct 31, 2012 at 2:54 PM, Arian Pasquali <[EMAIL PROTECTED]>wrote:

> can create an expression to compare datatypes?
> is it possible?
>
> ArianP
>
> 2012/10/31 Arian Pasquali <[EMAIL PROTECTED]>
>
> > you are right, it doesn't seam like a null value.
> > it looks like a chararray. But the expression causes error when comparing
> > a string with ([longitude#-9.15199849,latitude#38.71179122])
> >
> > geoinfo_no_nulls = FILTER geoinfo BY $0!='null'
> >
> > I get
> > ERROR 2997: Unable to recreate exception from backed error:
> > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot
> > convert a map to a String
> >
> > what's be the best way to filter only the valid rows, since some of them
> > are string and others map?
> >
> > Arian
> >
> >
> >
> > 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]>
> >
> >> Hi,
> >>
> >> I am not sure what's the problem because I can't reproduce it. To me,
> null
> >> values are printed as an empty "( )" not "(null)", so it doesn't seem
> like
> >> null.
> >>
> >> I am wondering whether OpenJDK is the problem. Can you try Oracle
> HotSpot
> >> JDK 1.6 and see that fixes it?
> >>
> >> Thanks,
> >> Cheolsoo
> >>
> >> On Wed, Oct 31, 2012 at 1:06 PM, Arian Pasquali <
> [EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > hey people
> >> > I'm having some troubles with a silly task, I can´t find a way to
> filter
> >> > null values from my rows. This is the result when I dump the object
> >> > geoinfo:
> >> >
> >> > DUMP geoinfo;
> >> > ([longitude#70.95853,latitude#30.9773])
> >> > ([longitude#-9.37944507,latitude#38.91780853])
> >> > (null)
> >> > (null)
> >> > (null)
> >> > ([longitude#-92.64416,latitude#16.73326])
> >> > (null)
> >> > (null)
> >> > ([longitude#-9.15199849,latitude#38.71179122])
> >> > ([longitude#-9.15210796,latitude#38.71195131])
> >> >
> >> > and here is the description
> >> >
> >> > DESCRIBE geoinfo;
> >> > geoinfo: {geoLocation: bytearray}
> >> >
> >> > What I'm trying to do is to filter null values like this:
> >> >
> >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null;
> >> >
> >> > but the result remains the same. nothing is filtered.
> >> >
> >> > I also tried something like this
> >> >
> >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null';
> >> >
> >> >  and I got an error
> >> >
> >> > org.apache.pig.backend.executionengine.ExecException: ERROR 1071:
> Cannot
> >> > convert a map to a String
> >> >
> >> > What am I doing wrong here?
> >> >
> >> > env details,
> >> >
> >> > Ubuntu 12.04.1 LTS,
> >> > hadoop-1.0.3
> >> > pig 0.9.3 version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012,
> >> 19:04:03
> >> > java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4)
> >> > (6b24-1.11.4-1ubuntu0.12.04.1)
> >> > OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
> >> >
> >> >
> >> > ArianP
> >> >
> >>
> >
> >
>
+
Arian Pasquali 2012-11-01, 19:47
+
Arian Pasquali 2012-11-17, 05:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB