Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> has this been reported? (bug)?


Copy link to this message
-
has this been reported? (bug)?
Wondering if someone has reported this bug in pig 0.8 (maybe it's been
fixed?)

data.txt (tab seperated file, bad site has no canonical_url populated):
badsite.com        127.0.0.1
goodsite.com/1?foo=true    goodsite.com    127.0.0.1

data = LOAD 'data.txt' using PigStorage() as (referrer:chararray,
canonical_url:chararray, ip:chararray);
best_url = FOREACH data GENERATE ((canonical_url != '' and canonical_url is
not null) ? canonical_url : referrer) AS url, ip;
filtered = FILTER best_url BY url == 'badsite.com';
dump filtered;

If I run this it will not return anything, it is as if url isn't being
populated with the contents of canonical or referrer.
But if I start pig with -Dpig.usenewlogicalplan=false it will return just
badsite.com as expected.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB