Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Yet more NPEs with TOP


Copy link to this message
-
Yet more NPEs with TOP
Jacob Perkins 2011-05-07, 05:30
Hello all,

I have a bit of a maddening issue with builtin TOP. Consider the
following script:
A = LOAD '$DATA' AS (a_bag:bag {t:tuple (value:double)});
B = FOREACH A {
        top_one = TOP(1,0,a_bag);
        GENERATE FLATTEN(top_one) AS (value);
    };
DUMP B;

Most of the time this script works just fine. However, for the following
piece of data it fails:
{(),(0.003941758108429772),(0.001153003956601468),(0.0028615543290043278),(0.004615471695397157),(0.0022558316773660025),(0.004126815965517207),(0.004625862298574576),(8.775460453796978E-4),(0.007921091717051021),(0.0014439247313395083),(0.004406076903740503),(0.0037146510632058063),(0.004849754737999144),(0.002838420238216161),(0.010496435435859057),(0.001325750201719551),(0.04320669798154559),(0.11254304985709146),(0.045453699949905196)}

Now, I understand it has a null field, but I've confirmed that TOP
handles this fine by passing in various things like {(),()} and so on.
What makes it yet more strange is that deleting a single _arbitrary_
entry from the line above gets rid of the error.
I really have no idea what's going on.

Any ideas?

--jacob
@thedatachef