Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> wrong sort order (lexical vs numeric) in a nested foreach


Copy link to this message
-
Re: wrong sort order (lexical vs numeric) in a nested foreach
I think I finally found the culprit. There is a load like this:

a = load '/foobar' using CustomJsonLoader('baz') as (m:map[]);  -- loading
an untyped map

then there is a flatten,
a1 = foreach a generate a#'id' as id: chararray, flatten(a#'listvals') as
(listvals: map[]); -- another untyped map

and then

a2 = foreach a1 generate id as id: chararray, listvals#'intval1' as
intval1: int,listvals#'intval2' as intval2:int;

by putting an explicit cast as such:
a2 = foreach a1 generate id as id: chararray,(int) listvals#'intval1' as
intval1: int,(int)listvals#'intval2' as intval2:int;

I've finally got the results I was looking for, without having to store and
reload.

Thanks

On Tue, Sep 4, 2012 at 2:05 PM, Lauren Blau <
[EMAIL PROTECTED]> wrote:

> unfortunately, I can't put together an example without sharing the custom
> jsonloader and data. But I've worked around this by explicitly storing and
> reloading the data.
> But it sounds like you have it backwards in your attempt to be sneaky. The
> data actually is an int and should be sorted numerically.
> I just read something in another email that leads me to believe I need to
> be casting the lhs of values in a foreach..
> like foreach x generate (int)field = fieldname: int;
>
> so I'm going to try explicit casts like that.
>
>
> Thanks,
> lauren
>
>
> On Sat, Sep 1, 2012 at 12:42 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]>wrote:
>
>> I tried to reproduce this and haven't been able to -- all my devious
>> attempts to get something that is actually a string to show up as an
>> int in "describe" wind up in class cast exceptions and blown up jobs
>> (not devious enough, clearly).
>>
>> Can you give put together an example that reproduces the issue, and
>> let us know which version of pig you are running?
>>
>> Thanks,
>> Dmitriy
>>
>> On Fri, Aug 31, 2012 at 2:42 AM, Lauren Blau
>> <[EMAIL PROTECTED]> wrote:
>> > Could this be a problem with the original read of the data. It is
>> stored in
>> > Json format and read with a custom Json loader.
>> > If I save the results of the loader to a file using PigStorage and then
>> run
>> > the same script reading from that file the sort is done numerically.
>> >
>> > I've had other pig script problems which have been solved by explicitly
>> > storing and re-reading using PigStorage.
>> > I'm not sure what I can check in the loader (I didn't write it) to see
>> what
>> > might be causing this,
>> > Any hints on how to debug this?
>> >
>> > Thanks,
>> > Lauren
>> >
>> > On Thu, Aug 30, 2012 at 6:10 PM, Lauren Blau <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> sorry, premature email :-).
>> >>
>> >> relation = key1 ,key2,orderkey1,val; //schema is
>> >> (chararray,int,int,chararray);
>> >>
>> >> groupbykey = group relation by (key1,key2);
>> >> foreach groupbykey {
>> >>     sorted = order  relation by orderkey1;
>> >>     generate flatten($0), MyUDF(sorted);
>> >> }
>> >>
>> >> I notice that when the 'sorted' values arrive in my UDF, they are
>> sorted
>> >> lexically, not numerically. I checked the schema on the way in and
>> >> orderkey1 is definitely an int.
>> >>
>> >> Is there any way to force the order by into a numeric sort?
>> >>
>> >> Thanks,
>> >> Lauren
>> >>
>> >>
>> >> On Thu, Aug 30, 2012 at 5:59 PM, Lauren Blau <
>> >> [EMAIL PROTECTED]> wrote:
>> >>
>> >>> I have the following foreach:
>> >>>
>> >>> foo := foreach bar {
>> >>>
>> >>>
>> >>
>>
>
>