Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Yet more NPEs with TOP


Copy link to this message
-
Yet more NPEs with TOP
Hello all,

I have a bit of a maddening issue with builtin TOP. Consider the
following script:
A = LOAD '$DATA' AS (a_bag:bag {t:tuple (value:double)});
B = FOREACH A {
        top_one = TOP(1,0,a_bag);
        GENERATE FLATTEN(top_one) AS (value);
    };
DUMP B;

Most of the time this script works just fine. However, for the following
piece of data it fails:
{(),(0.003941758108429772),(0.001153003956601468),(0.0028615543290043278),(0.004615471695397157),(0.0022558316773660025),(0.004126815965517207),(0.004625862298574576),(8.775460453796978E-4),(0.007921091717051021),(0.0014439247313395083),(0.004406076903740503),(0.0037146510632058063),(0.004849754737999144),(0.002838420238216161),(0.010496435435859057),(0.001325750201719551),(0.04320669798154559),(0.11254304985709146),(0.045453699949905196)}

Now, I understand it has a null field, but I've confirmed that TOP
handles this fine by passing in various things like {(),()} and so on.
What makes it yet more strange is that deleting a single _arbitrary_
entry from the line above gets rid of the error.
I really have no idea what's going on.

Any ideas?

--jacob
@thedatachef
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB