|
Arian Pasquali
2012-10-31, 20:06
Cheolsoo Park
2012-10-31, 21:25
Arian Pasquali
2012-10-31, 21:50
Arian Pasquali
2012-10-31, 21:54
Cheolsoo Park
2012-10-31, 23:03
Arian Pasquali
2012-11-01, 19:47
Arian Pasquali
2012-11-17, 05:01
|
-
problem filtering null values with pigArian Pasquali 2012-10-31, 20:06
hey people
I'm having some troubles with a silly task, I can´t find a way to filter null values from my rows. This is the result when I dump the object geoinfo: DUMP geoinfo; ([longitude#70.95853,latitude#30.9773]) ([longitude#-9.37944507,latitude#38.91780853]) (null) (null) (null) ([longitude#-92.64416,latitude#16.73326]) (null) (null) ([longitude#-9.15199849,latitude#38.71179122]) ([longitude#-9.15210796,latitude#38.71195131]) and here is the description DESCRIBE geoinfo; geoinfo: {geoLocation: bytearray} What I'm trying to do is to filter null values like this: geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null; but the result remains the same. nothing is filtered. I also tried something like this geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null'; and I got an error org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a map to a String What am I doing wrong here? env details, Ubuntu 12.04.1 LTS, hadoop-1.0.3 pig 0.9.3 version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012, 19:04:03 java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) ArianP
-
Re: problem filtering null values with pigCheolsoo Park 2012-10-31, 21:25
Hi,
I am not sure what's the problem because I can't reproduce it. To me, null values are printed as an empty "( )" not "(null)", so it doesn't seem like null. I am wondering whether OpenJDK is the problem. Can you try Oracle HotSpot JDK 1.6 and see that fixes it? Thanks, Cheolsoo On Wed, Oct 31, 2012 at 1:06 PM, Arian Pasquali <[EMAIL PROTECTED]>wrote: > hey people > I'm having some troubles with a silly task, I can´t find a way to filter > null values from my rows. This is the result when I dump the object > geoinfo: > > DUMP geoinfo; > ([longitude#70.95853,latitude#30.9773]) > ([longitude#-9.37944507,latitude#38.91780853]) > (null) > (null) > (null) > ([longitude#-92.64416,latitude#16.73326]) > (null) > (null) > ([longitude#-9.15199849,latitude#38.71179122]) > ([longitude#-9.15210796,latitude#38.71195131]) > > and here is the description > > DESCRIBE geoinfo; > geoinfo: {geoLocation: bytearray} > > What I'm trying to do is to filter null values like this: > > geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null; > > but the result remains the same. nothing is filtered. > > I also tried something like this > > geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null'; > > and I got an error > > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot > convert a map to a String > > What am I doing wrong here? > > env details, > > Ubuntu 12.04.1 LTS, > hadoop-1.0.3 > pig 0.9.3 version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012, 19:04:03 > java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4) > (6b24-1.11.4-1ubuntu0.12.04.1) > OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) > > > ArianP >
-
Re: problem filtering null values with pigArian Pasquali 2012-10-31, 21:50
you are right, it doesn't seam like a null value.
it looks like a chararray. But the expression causes error when comparing a string with ([longitude#-9.15199849,latitude#38.71179122]) geoinfo_no_nulls = FILTER geoinfo BY $0!='null' I get ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a map to a String what's be the best way to filter only the valid rows, since some of them are string and others map? Arian 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]> > Hi, > > I am not sure what's the problem because I can't reproduce it. To me, null > values are printed as an empty "( )" not "(null)", so it doesn't seem like > null. > > I am wondering whether OpenJDK is the problem. Can you try Oracle HotSpot > JDK 1.6 and see that fixes it? > > Thanks, > Cheolsoo > > On Wed, Oct 31, 2012 at 1:06 PM, Arian Pasquali <[EMAIL PROTECTED] > >wrote: > > > hey people > > I'm having some troubles with a silly task, I can´t find a way to filter > > null values from my rows. This is the result when I dump the object > > geoinfo: > > > > DUMP geoinfo; > > ([longitude#70.95853,latitude#30.9773]) > > ([longitude#-9.37944507,latitude#38.91780853]) > > (null) > > (null) > > (null) > > ([longitude#-92.64416,latitude#16.73326]) > > (null) > > (null) > > ([longitude#-9.15199849,latitude#38.71179122]) > > ([longitude#-9.15210796,latitude#38.71195131]) > > > > and here is the description > > > > DESCRIBE geoinfo; > > geoinfo: {geoLocation: bytearray} > > > > What I'm trying to do is to filter null values like this: > > > > geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null; > > > > but the result remains the same. nothing is filtered. > > > > I also tried something like this > > > > geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null'; > > > > and I got an error > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot > > convert a map to a String > > > > What am I doing wrong here? > > > > env details, > > > > Ubuntu 12.04.1 LTS, > > hadoop-1.0.3 > > pig 0.9.3 version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012, > 19:04:03 > > java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4) > > (6b24-1.11.4-1ubuntu0.12.04.1) > > OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) > > > > > > ArianP > > >
-
Re: problem filtering null values with pigArian Pasquali 2012-10-31, 21:54
can create an expression to compare datatypes?
is it possible? ArianP 2012/10/31 Arian Pasquali <[EMAIL PROTECTED]> > you are right, it doesn't seam like a null value. > it looks like a chararray. But the expression causes error when comparing > a string with ([longitude#-9.15199849,latitude#38.71179122]) > > geoinfo_no_nulls = FILTER geoinfo BY $0!='null' > > I get > ERROR 2997: Unable to recreate exception from backed error: > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot > convert a map to a String > > what's be the best way to filter only the valid rows, since some of them > are string and others map? > > Arian > > > > 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]> > >> Hi, >> >> I am not sure what's the problem because I can't reproduce it. To me, null >> values are printed as an empty "( )" not "(null)", so it doesn't seem like >> null. >> >> I am wondering whether OpenJDK is the problem. Can you try Oracle HotSpot >> JDK 1.6 and see that fixes it? >> >> Thanks, >> Cheolsoo >> >> On Wed, Oct 31, 2012 at 1:06 PM, Arian Pasquali <[EMAIL PROTECTED] >> >wrote: >> >> > hey people >> > I'm having some troubles with a silly task, I can´t find a way to filter >> > null values from my rows. This is the result when I dump the object >> > geoinfo: >> > >> > DUMP geoinfo; >> > ([longitude#70.95853,latitude#30.9773]) >> > ([longitude#-9.37944507,latitude#38.91780853]) >> > (null) >> > (null) >> > (null) >> > ([longitude#-92.64416,latitude#16.73326]) >> > (null) >> > (null) >> > ([longitude#-9.15199849,latitude#38.71179122]) >> > ([longitude#-9.15210796,latitude#38.71195131]) >> > >> > and here is the description >> > >> > DESCRIBE geoinfo; >> > geoinfo: {geoLocation: bytearray} >> > >> > What I'm trying to do is to filter null values like this: >> > >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null; >> > >> > but the result remains the same. nothing is filtered. >> > >> > I also tried something like this >> > >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null'; >> > >> > and I got an error >> > >> > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot >> > convert a map to a String >> > >> > What am I doing wrong here? >> > >> > env details, >> > >> > Ubuntu 12.04.1 LTS, >> > hadoop-1.0.3 >> > pig 0.9.3 version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012, >> 19:04:03 >> > java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4) >> > (6b24-1.11.4-1ubuntu0.12.04.1) >> > OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) >> > >> > >> > ArianP >> > >> > >
-
Re: problem filtering null values with pigCheolsoo Park 2012-10-31, 23:03
Hi,
> what's be the best way to filter only the valid rows, since some of them are string and others map? This shouldn't happen. The data type is defined per column, so it should be either string or map for all rows. If that's not the case, it should be a bug. > can create an expression to compare datatypes? is it possible? Technically, you should be able to write a UDF that checks type. But I am more interested in knowing why you're running into this problem. Can you please share your script and sample data? I'd like to reproduce it. Thanks, Cheolsoo On Wed, Oct 31, 2012 at 2:54 PM, Arian Pasquali <[EMAIL PROTECTED]>wrote: > can create an expression to compare datatypes? > is it possible? > > ArianP > > 2012/10/31 Arian Pasquali <[EMAIL PROTECTED]> > > > you are right, it doesn't seam like a null value. > > it looks like a chararray. But the expression causes error when comparing > > a string with ([longitude#-9.15199849,latitude#38.71179122]) > > > > geoinfo_no_nulls = FILTER geoinfo BY $0!='null' > > > > I get > > ERROR 2997: Unable to recreate exception from backed error: > > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot > > convert a map to a String > > > > what's be the best way to filter only the valid rows, since some of them > > are string and others map? > > > > Arian > > > > > > > > 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]> > > > >> Hi, > >> > >> I am not sure what's the problem because I can't reproduce it. To me, > null > >> values are printed as an empty "( )" not "(null)", so it doesn't seem > like > >> null. > >> > >> I am wondering whether OpenJDK is the problem. Can you try Oracle > HotSpot > >> JDK 1.6 and see that fixes it? > >> > >> Thanks, > >> Cheolsoo > >> > >> On Wed, Oct 31, 2012 at 1:06 PM, Arian Pasquali < > [EMAIL PROTECTED] > >> >wrote: > >> > >> > hey people > >> > I'm having some troubles with a silly task, I can´t find a way to > filter > >> > null values from my rows. This is the result when I dump the object > >> > geoinfo: > >> > > >> > DUMP geoinfo; > >> > ([longitude#70.95853,latitude#30.9773]) > >> > ([longitude#-9.37944507,latitude#38.91780853]) > >> > (null) > >> > (null) > >> > (null) > >> > ([longitude#-92.64416,latitude#16.73326]) > >> > (null) > >> > (null) > >> > ([longitude#-9.15199849,latitude#38.71179122]) > >> > ([longitude#-9.15210796,latitude#38.71195131]) > >> > > >> > and here is the description > >> > > >> > DESCRIBE geoinfo; > >> > geoinfo: {geoLocation: bytearray} > >> > > >> > What I'm trying to do is to filter null values like this: > >> > > >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null; > >> > > >> > but the result remains the same. nothing is filtered. > >> > > >> > I also tried something like this > >> > > >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null'; > >> > > >> > and I got an error > >> > > >> > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: > Cannot > >> > convert a map to a String > >> > > >> > What am I doing wrong here? > >> > > >> > env details, > >> > > >> > Ubuntu 12.04.1 LTS, > >> > hadoop-1.0.3 > >> > pig 0.9.3 version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012, > >> 19:04:03 > >> > java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.4) > >> > (6b24-1.11.4-1ubuntu0.12.04.1) > >> > OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) > >> > > >> > > >> > ArianP > >> > > >> > > > > >
-
Re: problem filtering null values with pigArian Pasquali 2012-11-01, 19:47
You are right Cheolsoo,
Indeed, it doesn't make any sense to write an UDF to compare datatypes. I know its possible, but doesn't sound the right way. Maybe it can be a bug at the JsonLoader I'm using https://github.com/mmay/PigJsonLoader/blob/master/JsonLoader.java I will share with u the script and the data in a few. tks for the hints. Arian Rodrigo Pasquali FEUP, SAPO Labs http://www.arianpasquali.com twitter @arianpasquali 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]> > Hi, > > > what's be the best way to filter only the valid rows, since some of > them are string and others map? > > This shouldn't happen. The data type is defined per column, so it should be > either string or map for all rows. If that's not the case, it should be a > bug. > > > can create an expression to compare datatypes? is it possible? > > Technically, you should be able to write a UDF that checks type. But I am > more interested in knowing why you're running into this problem. Can you > please share your script and sample data? I'd like to reproduce it. > > Thanks, > Cheolsoo > > On Wed, Oct 31, 2012 at 2:54 PM, Arian Pasquali <[EMAIL PROTECTED] > >wrote: > > > can create an expression to compare datatypes? > > is it possible? > > > > ArianP > > > > 2012/10/31 Arian Pasquali <[EMAIL PROTECTED]> > > > > > you are right, it doesn't seam like a null value. > > > it looks like a chararray. But the expression causes error when > comparing > > > a string with ([longitude#-9.15199849,latitude#38.71179122]) > > > > > > geoinfo_no_nulls = FILTER geoinfo BY $0!='null' > > > > > > I get > > > ERROR 2997: Unable to recreate exception from backed error: > > > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: > Cannot > > > convert a map to a String > > > > > > what's be the best way to filter only the valid rows, since some of > them > > > are string and others map? > > > > > > Arian > > > > > > > > > > > > 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]> > > > > > >> Hi, > > >> > > >> I am not sure what's the problem because I can't reproduce it. To me, > > null > > >> values are printed as an empty "( )" not "(null)", so it doesn't seem > > like > > >> null. > > >> > > >> I am wondering whether OpenJDK is the problem. Can you try Oracle > > HotSpot > > >> JDK 1.6 and see that fixes it? > > >> > > >> Thanks, > > >> Cheolsoo > > >> > > >> On Wed, Oct 31, 2012 at 1:06 PM, Arian Pasquali < > > [EMAIL PROTECTED] > > >> >wrote: > > >> > > >> > hey people > > >> > I'm having some troubles with a silly task, I can´t find a way to > > filter > > >> > null values from my rows. This is the result when I dump the object > > >> > geoinfo: > > >> > > > >> > DUMP geoinfo; > > >> > ([longitude#70.95853,latitude#30.9773]) > > >> > ([longitude#-9.37944507,latitude#38.91780853]) > > >> > (null) > > >> > (null) > > >> > (null) > > >> > ([longitude#-92.64416,latitude#16.73326]) > > >> > (null) > > >> > (null) > > >> > ([longitude#-9.15199849,latitude#38.71179122]) > > >> > ([longitude#-9.15210796,latitude#38.71195131]) > > >> > > > >> > and here is the description > > >> > > > >> > DESCRIBE geoinfo; > > >> > geoinfo: {geoLocation: bytearray} > > >> > > > >> > What I'm trying to do is to filter null values like this: > > >> > > > >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation is not null; > > >> > > > >> > but the result remains the same. nothing is filtered. > > >> > > > >> > I also tried something like this > > >> > > > >> > geoinfo_no_nulls = FILTER geoinfo BY geoLocation != 'null'; > > >> > > > >> > and I got an error > > >> > > > >> > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: > > Cannot > > >> > convert a map to a String > > >> > > > >> > What am I doing wrong here? > > >> > > > >> > env details, > > >> > > > >> > Ubuntu 12.04.1 LTS, > > >> > hadoop-1.0.3 > > >> > pig 0.9.3 version 0.9.3-SNAPSHOT (rexported) compiled Oct 24 2012, > > >> 19:04:03 > > >> > java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6
-
Re: problem filtering null values with pigArian Pasquali 2012-11-17, 05:01
just for the record
I m posting here the solution for my problem. Thank you for your help. In the end the problem seams to be with the JsonLoader I was using. I don't know why exactly, but it seams to have a bug with my strings. I finally changed my code to use https://github.com/kevinweil/elephant-bird. the code now looks like this: register 'elephant-bird-core-3.0.0.jar' register 'elephant-bird-pig-3.0.0.jar' register 'google-collections-1.0.jar' register 'json-simple-1.1.jar' json_lines = LOAD '/twitterecho/tweets/stream/v1/json/2012_10_10/08' USING com.twitter.elephantbird.pig.load.JsonLoader(); geo_tweets = FOREACH json_lines GENERATE (CHARARRAY) $0#'id' AS id, (CHARARRAY) $0#'geoLocation' AS geoLocation; tweets_grp = GROUP geo_tweets BY id; unique_tweets = FOREACH tweets_grp { first_tweet = LIMIT inpt 1; GENERATE FLATTEN(first_tweet); }; only_not_nulls = FILTER geo_tweets BY geoLocation is not null; store only_not_nulls into '/twitter_data/results/geo_tweets'; cheers thanks again for your support Arian P 2012/11/1 Arian Pasquali <[EMAIL PROTECTED]> > You are right Cheolsoo, > Indeed, it doesn't make any sense to write an UDF to compare datatypes. I > know its possible, but doesn't sound the right way. > Maybe it can be a bug at the JsonLoader I'm using > https://github.com/mmay/PigJsonLoader/blob/master/JsonLoader.java > > I will share with u the script and the data in a few. > > tks for the hints. > > Arian Rodrigo Pasquali > FEUP, SAPO Labs > http://www.arianpasquali.com > twitter @arianpasquali > > > > 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]> > >> Hi, >> >> > what's be the best way to filter only the valid rows, since some of >> them are string and others map? >> >> This shouldn't happen. The data type is defined per column, so it should >> be >> either string or map for all rows. If that's not the case, it should be a >> bug. >> >> > can create an expression to compare datatypes? is it possible? >> >> Technically, you should be able to write a UDF that checks type. But I am >> more interested in knowing why you're running into this problem. Can you >> please share your script and sample data? I'd like to reproduce it. >> >> Thanks, >> Cheolsoo >> >> On Wed, Oct 31, 2012 at 2:54 PM, Arian Pasquali <[EMAIL PROTECTED] >> >wrote: >> >> > can create an expression to compare datatypes? >> > is it possible? >> > >> > ArianP >> > >> > 2012/10/31 Arian Pasquali <[EMAIL PROTECTED]> >> > >> > > you are right, it doesn't seam like a null value. >> > > it looks like a chararray. But the expression causes error when >> comparing >> > > a string with ([longitude#-9.15199849,latitude#38.71179122]) >> > > >> > > geoinfo_no_nulls = FILTER geoinfo BY $0!='null' >> > > >> > > I get >> > > ERROR 2997: Unable to recreate exception from backed error: >> > > org.apache.pig.backend.executionengine.ExecException: ERROR 1071: >> Cannot >> > > convert a map to a String >> > > >> > > what's be the best way to filter only the valid rows, since some of >> them >> > > are string and others map? >> > > >> > > Arian >> > > >> > > >> > > >> > > 2012/10/31 Cheolsoo Park <[EMAIL PROTECTED]> >> > > >> > >> Hi, >> > >> >> > >> I am not sure what's the problem because I can't reproduce it. To me, >> > null >> > >> values are printed as an empty "( )" not "(null)", so it doesn't seem >> > like >> > >> null. >> > >> >> > >> I am wondering whether OpenJDK is the problem. Can you try Oracle >> > HotSpot >> > >> JDK 1.6 and see that fixes it? >> > >> >> > >> Thanks, >> > >> Cheolsoo >> > >> >> > >> On Wed, Oct 31, 2012 at 1:06 PM, Arian Pasquali < >> > [EMAIL PROTECTED] >> > >> >wrote: >> > >> >> > >> > hey people >> > >> > I'm having some troubles with a silly task, I can´t find a way to >> > filter >> > >> > null values from my rows. This is the result when I dump the object >> > >> > geoinfo: >> > >> > >> > >> > DUMP geoinfo; >> > >> > ([longitude#70.95853,latitude#30.9773]) |