Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> ORDER Issue (repost to avoid spam filters)


Copy link to this message
-
Re: ORDER Issue (repost to avoid spam filters)
I was wondering if the bytes column is having all null values (probably
because the input has formatting issues.)

Can check you if the following query gives any output -

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

non_null_bytes = FILTER target by bytes is not null;

dump just_bytes;

-Thejas
On 8/20/10 1:56 PM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:

> UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and the
> script worked as intended over the data set. This leads me to believe
> that the issue is with pig-0.7.0 or my configuration. I would however
> like to not pay for something that is free :D. Any other ideas would be
> most welcome
>
>
>
> @Thejas
>
> I changed the Script to:
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> just_bytes= FOREACH target GENERATE bytes;
>
> fail = ORDER just_bytes BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error as before. I then changed the script to:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> stored = STORE target INTO 'myoutput';
>
> second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
> (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
> packets:int, bytes:int, flags:chararray, startTime:long, endTime:long);
>
> fail = ORDER second_start BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error.
>
>
>
> @Mridul
>
> I am using local mode at the moment. I don't understand the second
> question.
>
>
>
> Thanks,
>
> Matt
>
>
>
>
>
>
>
> From: Thejas M Nair [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 19, 2010 5:34 PM
> To: [EMAIL PROTECTED]; Matthew Smith
> Subject: Re: ORDER Issue (repost to avoid spam filters)
>
>
>
> I think 0.7 had an issue where order-by used to fail if the input was
> empty. But that does not seem to be the case here.
> I am wondering if there is a parsing/data-format issue that is causing
> bytes column to be empty , though I am not aware of emtpy/null value of
> sort column causing issues.
> Can you try dumping just the bytes column ?
> Another thing you can try is to store the output of filter and load data
> again before doing order-by ..
>
> Please let us know what you find.
>
> Thanks,
> Thejas
>
>
>
>
> On 8/19/10 11:35 AM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:
>
> All,
>
>
>
> I am running pig-0.7.0 and I have been running into an issue running the
> ORDER command. I have attempted to run pig out of the box on 2 separate
> LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
> occurred. I run these commands in a script file:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> fail = ORDER target BY bytes DESC;
>
>
>
> not_reached = LIMIT fail 10;
>
>
>
> dump not_reached;
>
>
>
>
>
> The error is listed below. I then run:
>
>
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';