Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> ORDER Issue (repost to avoid spam filters)


Copy link to this message
-
Re: ORDER Issue (repost to avoid spam filters)
I was wondering if the bytes column is having all null values (probably
because the input has formatting issues.)

Can check you if the following query gives any output -

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

non_null_bytes = FILTER target by bytes is not null;

dump just_bytes;

-Thejas
On 8/20/10 1:56 PM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:

> UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and the
> script worked as intended over the data set. This leads me to believe
> that the issue is with pig-0.7.0 or my configuration. I would however
> like to not pay for something that is free :D. Any other ideas would be
> most welcome
>
>
>
> @Thejas
>
> I changed the Script to:
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> just_bytes= FOREACH target GENERATE bytes;
>
> fail = ORDER just_bytes BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error as before. I then changed the script to:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
> target = FILTER start BY sip matches '51.37.8.63';
>
> stored = STORE target INTO 'myoutput';
>
> second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
> (sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
> packets:int, bytes:int, flags:chararray, startTime:long, endTime:long);
>
> fail = ORDER second_start BY bytes DESC;
>
> not_reached = LIMIT fail 10;
>
> dump not_reached;
>
>
>
> and received the same error.
>
>
>
> @Mridul
>
> I am using local mode at the moment. I don't understand the second
> question.
>
>
>
> Thanks,
>
> Matt
>
>
>
>
>
>
>
> From: Thejas M Nair [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 19, 2010 5:34 PM
> To: [EMAIL PROTECTED]; Matthew Smith
> Subject: Re: ORDER Issue (repost to avoid spam filters)
>
>
>
> I think 0.7 had an issue where order-by used to fail if the input was
> empty. But that does not seem to be the case here.
> I am wondering if there is a parsing/data-format issue that is causing
> bytes column to be empty , though I am not aware of emtpy/null value of
> sort column causing issues.
> Can you try dumping just the bytes column ?
> Another thing you can try is to store the output of filter and load data
> again before doing order-by ..
>
> Please let us know what you find.
>
> Thanks,
> Thejas
>
>
>
>
> On 8/19/10 11:35 AM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:
>
> All,
>
>
>
> I am running pig-0.7.0 and I have been running into an issue running the
> ORDER command. I have attempted to run pig out of the box on 2 separate
> LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
> occurred. I run these commands in a script file:
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
>
>
>
> fail = ORDER target BY bytes DESC;
>
>
>
> not_reached = LIMIT fail 10;
>
>
>
> dump not_reached;
>
>
>
>
>
> The error is listed below. I then run:
>
>
>
>
>
> start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
> dip:chararray, sport:int, dport:int, protocol:int, packets:int,
> bytes:int, flags:chararray, startTime:long, endTime:long);
>
>
>
> target = FILTER start BY sip matches '51.37.8.63';
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB