Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> ORDER Issue (repost to avoid spam filters)


Copy link to this message
-
Re: ORDER Issue (repost to avoid spam filters)
Can you check if the initial MR jobs in the order-by query failed because of
some other error ? (specifically the sampling MR job that is part of
order-by). Maybe, for some reason(bug?) pig did not capture/log that error.
-Thejas

On 8/23/10 11:13 AM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:

> Update:
> After downloading and installing pig-0.6.0, I ran the script again over
> the same data set. It produced the desired results. I don't know what I
> am doing wrong in 0.7.0, but will be reverting back to 0.6.0 until I can
> sort out what went wrong in 0.7.0. Thoughts are still welcome and wanted
> :D
>
> Thanks,
> Matt
>
> -----Original Message-----
> From: Matthew Smith [mailto:[EMAIL PROTECTED]]
> Sent: Monday, August 23, 2010 11:39 AM
> To: Thejas M Nair; [EMAIL PROTECTED]
> Subject: RE: ORDER Issue (repost to avoid spam filters)
>
> Changed the script to:
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
> target = FILTER start BY sip matches '51.37.8.63';
> not_null_bytes = FILTER target BY bytes is not null;
> dump not_null_bytes;
>
> and dumped the expected tuples. There were plenty of records that were
> valid. I will attempt to revert everything to pig-0.6.0 and re run the
> scripts to determine if the issue is in pig-0.7.0.
>
> Matt
>
> -----Original Message-----
> From: Thejas M Nair [mailto:[EMAIL PROTECTED]]
> Sent: Friday, August 20, 2010 5:23 PM
> To: [EMAIL PROTECTED]; Matthew Smith
> Subject: Re: ORDER Issue (repost to avoid spam filters)
>
> I was wondering if the bytes column is having all null values (probably
> because the input has formatting issues.)
>
> Can check you if the following query gives any output -
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> non_null_bytes = FILTER target by bytes is not null;
>
> dump just_bytes;
>
> -Thejas
>
>
> On 8/20/10 1:56 PM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:
>
>> UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and
> the
>> script worked as intended over the data set. This leads me to believe
>> that the issue is with pig-0.7.0 or my configuration. I would however
>> like to not pay for something that is free :D. Any other ideas would
> be
>> most welcome
>>
>>
>>
>> @Thejas
>>
>> I changed the Script to:
>>
>> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
>> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
>> bytes:int, flags:chararray, startTime:long, endTime:long);
>>
>> target = FILTER start BY sip matches '51.37.8.63';
>>
>> just_bytes= FOREACH target GENERATE bytes;
>>
>> fail = ORDER just_bytes BY bytes DESC;
>>
>> not_reached = LIMIT fail 10;
>>
>> dump not_reached;
>>
>>
>>
>> and received the same error as before. I then changed the script to:
>>
>>
>>
>> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
>> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
>> bytes:int, flags:chararray, startTime:long, endTime:long);
>>
>> target = FILTER start BY sip matches '51.37.8.63';
>>
>> stored = STORE target INTO 'myoutput';
>>
>> second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
>> (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
>> packets:int, bytes:int, flags:chararray, startTime:long,
> endTime:long);
>>
>> fail = ORDER second_start BY bytes DESC;
>>
>> not_reached = LIMIT fail 10;
>>
>> dump not_reached;
>>
>>
>>
>> and received the same error.
>>
>>
>>
>> @Mridul
>>
>> I am using local mode at the moment. I don't understand the second
>> question.
>>
>>
>>
>> Thanks,
>>
>> Matt
>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB