Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - has this been reported? (bug)?


Copy link to this message
-
has this been reported? (bug)?
Corbin Hoenes 2011-03-24, 22:13
Wondering if someone has reported this bug in pig 0.8 (maybe it's been
fixed?)

data.txt (tab seperated file, bad site has no canonical_url populated):
badsite.com        127.0.0.1
goodsite.com/1?foo=true    goodsite.com    127.0.0.1

data = LOAD 'data.txt' using PigStorage() as (referrer:chararray,
canonical_url:chararray, ip:chararray);
best_url = FOREACH data GENERATE ((canonical_url != '' and canonical_url is
not null) ? canonical_url : referrer) AS url, ip;
filtered = FILTER best_url BY url == 'badsite.com';
dump filtered;

If I run this it will not return anything, it is as if url isn't being
populated with the contents of canonical or referrer.
But if I start pig with -Dpig.usenewlogicalplan=false it will return just
badsite.com as expected.