Wondering if someone has reported this bug in pig 0.8 (maybe it's been
data.txt (tab seperated file, bad site has no canonical_url populated):
goodsite.com/1?foo=true goodsite.com 127.0.0.1
data = LOAD 'data.txt' using PigStorage() as (referrer:chararray,
best_url = FOREACH data GENERATE ((canonical_url != '' and canonical_url is
not null) ? canonical_url : referrer) AS url, ip;
filtered = FILTER best_url BY url == 'badsite.com';
If I run this it will not return anything, it is as if url isn't being
populated with the contents of canonical or referrer.
But if I start pig with -Dpig.usenewlogicalplan=false it will return just
badsite.com as expected.