Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - ORDER Issue (repost to avoid spam filters)


Copy link to this message
-
Re: ORDER Issue (repost to avoid spam filters)
Thejas M Nair 2010-08-19, 21:34
I think 0.7 had an issue where order-by used to fail if the input was empty. But that does not seem to be the case here.
I am wondering if there is a parsing/data-format issue that is causing bytes column to be empty , though I am not aware of emtpy/null value of sort column causing issues.
Can you try dumping just the bytes column ?
Another thing you can try is to store the output of filter and load data again before doing order-by ..

Please let us know what you find.

Thanks,
Thejas
On 8/19/10 11:35 AM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:

All,

I am running pig-0.7.0 and I have been running into an issue running the
ORDER command. I have attempted to run pig out of the box on 2 separate
LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
occurred. I run these commands in a script file:

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

fail = ORDER target BY bytes DESC;

not_reached = LIMIT fail 10;

dump not_reached;

The error is listed below. I then run:

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

dump target;

This script produces a large list of sips matching the filter.  What am
I doing wrong that causes pig to not want to ORDER these properly? I
have been wrestling with this issue for a week now. Any help would be
greatly appreciated.

Best,

Matthew

/ERROR

java.lang.RuntimeException:

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/user/matt/pigsample_24118161_1282155871461

                at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner

s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)

                at

org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

                at

org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:

117)

                at

org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:

527)

                at

org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)

                at

org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

                at

org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:

Input path does not exist:

file:/user/matt/pigsample_24118161_1282155871461

                at

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp

utFormat.java:224)

                at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu

tFormat.listStatus(PigFileInputFormat.java:37)

                at

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu

tFormat.java:241)

                at

org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)

                at

org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)

                at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner

s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)

                ... 6 more