Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> has this been reported? (bug)?

Copy link to this message
has this been reported? (bug)?
Wondering if someone has reported this bug in pig 0.8 (maybe it's been

data.txt (tab seperated file, bad site has no canonical_url populated):
goodsite.com/1?foo=true    goodsite.com

data = LOAD 'data.txt' using PigStorage() as (referrer:chararray,
canonical_url:chararray, ip:chararray);
best_url = FOREACH data GENERATE ((canonical_url != '' and canonical_url is
not null) ? canonical_url : referrer) AS url, ip;
filtered = FILTER best_url BY url == 'badsite.com';
dump filtered;

If I run this it will not return anything, it is as if url isn't being
populated with the contents of canonical or referrer.
But if I start pig with -Dpig.usenewlogicalplan=false it will return just
badsite.com as expected.