Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> wrong sort order (lexical vs numeric) in a nested foreach


+
Lauren Blau 2012-08-30, 21:59
+
Lauren Blau 2012-08-30, 22:10
Copy link to this message
-
Re: wrong sort order (lexical vs numeric) in a nested foreach
Could this be a problem with the original read of the data. It is stored in
Json format and read with a custom Json loader.
If I save the results of the loader to a file using PigStorage and then run
the same script reading from that file the sort is done numerically.

I've had other pig script problems which have been solved by explicitly
storing and re-reading using PigStorage.
I'm not sure what I can check in the loader (I didn't write it) to see what
might be causing this,
Any hints on how to debug this?

Thanks,
Lauren

On Thu, Aug 30, 2012 at 6:10 PM, Lauren Blau <
[EMAIL PROTECTED]> wrote:

> sorry, premature email :-).
>
> relation = key1 ,key2,orderkey1,val; //schema is
> (chararray,int,int,chararray);
>
> groupbykey = group relation by (key1,key2);
> foreach groupbykey {
>     sorted = order  relation by orderkey1;
>     generate flatten($0), MyUDF(sorted);
> }
>
> I notice that when the 'sorted' values arrive in my UDF, they are sorted
> lexically, not numerically. I checked the schema on the way in and
> orderkey1 is definitely an int.
>
> Is there any way to force the order by into a numeric sort?
>
> Thanks,
> Lauren
>
>
> On Thu, Aug 30, 2012 at 5:59 PM, Lauren Blau <
> [EMAIL PROTECTED]> wrote:
>
>> I have the following foreach:
>>
>> foo := foreach bar {
>>
>>
>
+
=?KOI8-U?B?96bUwcymyiD0yc... 2012-08-31, 20:55
+
Dmitriy Ryaboy 2012-09-01, 04:42
+
Lauren Blau 2012-09-04, 18:05
+
Lauren Blau 2012-09-04, 20:27
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB