Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - ORDER Issue (repost to avoid spam filters)


Copy link to this message
-
Re: ORDER Issue (repost to avoid spam filters)
Thejas M Nair 2010-08-25, 16:03
Can you check if the initial MR jobs in the order-by query failed because of
some other error ? (specifically the sampling MR job that is part of
order-by). Maybe, for some reason(bug?) pig did not capture/log that error.
-Thejas

On 8/23/10 11:13 AM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:

> Update:
> After downloading and installing pig-0.6.0, I ran the script again over
> the same data set. It produced the desired results. I don't know what I
> am doing wrong in 0.7.0, but will be reverting back to 0.6.0 until I can
> sort out what went wrong in 0.7.0. Thoughts are still welcome and wanted
> :D
>
> Thanks,
> Matt
>
> -----Original Message-----
> From: Matthew Smith [mailto:[EMAIL PROTECTED]]
> Sent: Monday, August 23, 2010 11:39 AM
> To: Thejas M Nair; [EMAIL PROTECTED]
> Subject: RE: ORDER Issue (repost to avoid spam filters)
>
> Changed the script to:
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
> target = FILTER start BY sip matches '51.37.8.63';
> not_null_bytes = FILTER target BY bytes is not null;
> dump not_null_bytes;
>
> and dumped the expected tuples. There were plenty of records that were
> valid. I will attempt to revert everything to pig-0.6.0 and re run the
> scripts to determine if the issue is in pig-0.7.0.
>
> Matt
>
> -----Original Message-----
> From: Thejas M Nair [mailto:[EMAIL PROTECTED]]
> Sent: Friday, August 20, 2010 5:23 PM
> To: [EMAIL PROTECTED]; Matthew Smith
> Subject: Re: ORDER Issue (repost to avoid spam filters)
>
> I was wondering if the bytes column is having all null values (probably
> because the input has formatting issues.)
>
> Can check you if the following query gives any output -
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> non_null_bytes = FILTER target by bytes is not null;
>
> dump just_bytes;
>
> -Thejas
>
>
> On 8/20/10 1:56 PM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:
>
>> UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and
> the
>> script worked as intended over the data set. This leads me to believe
>> that the issue is with pig-0.7.0 or my configuration. I would however
>> like to not pay for something that is free :D. Any other ideas would
> be
>> most welcome
>>
>>
>>
>> @Thejas
>>
>> I changed the Script to:
>>
>> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
>> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
>> bytes:int, flags:chararray, startTime:long, endTime:long);
>>
>> target = FILTER start BY sip matches '51.37.8.63';
>>
>> just_bytes= FOREACH target GENERATE bytes;
>>
>> fail = ORDER just_bytes BY bytes DESC;
>>
>> not_reached = LIMIT fail 10;
>>
>> dump not_reached;
>>
>>
>>
>> and received the same error as before. I then changed the script to:
>>
>>
>>
>> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
>> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
>> bytes:int, flags:chararray, startTime:long, endTime:long);
>>
>> target = FILTER start BY sip matches '51.37.8.63';
>>
>> stored = STORE target INTO 'myoutput';
>>
>> second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
>> (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
>> packets:int, bytes:int, flags:chararray, startTime:long,
> endTime:long);
>>
>> fail = ORDER second_start BY bytes DESC;
>>
>> not_reached = LIMIT fail 10;
>>
>> dump not_reached;
>>
>>
>>
>> and received the same error.
>>
>>
>>
>> @Mridul
>>
>> I am using local mode at the moment. I don't understand the second
>> question.
>>
>>
>>
>> Thanks,
>>
>> Matt
>>
>>
>>
>