Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> ORDER Issue (repost to avoid spam filters)


Copy link to this message
-
Re: ORDER Issue (repost to avoid spam filters)
I think 0.7 had an issue where order-by used to fail if the input was empty. But that does not seem to be the case here.
I am wondering if there is a parsing/data-format issue that is causing bytes column to be empty , though I am not aware of emtpy/null value of sort column causing issues.
Can you try dumping just the bytes column ?
Another thing you can try is to store the output of filter and load data again before doing order-by ..

Please let us know what you find.

Thanks,
Thejas
On 8/19/10 11:35 AM, "Matthew Smith" <[EMAIL PROTECTED]> wrote:

All,

I am running pig-0.7.0 and I have been running into an issue running the
ORDER command. I have attempted to run pig out of the box on 2 separate
LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
occurred. I run these commands in a script file:

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

fail = ORDER target BY bytes DESC;

not_reached = LIMIT fail 10;

dump not_reached;

The error is listed below. I then run:

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

dump target;

This script produces a large list of sips matching the filter.  What am
I doing wrong that causes pig to not want to ORDER these properly? I
have been wrestling with this issue for a week now. Any help would be
greatly appreciated.

Best,

Matthew

/ERROR

java.lang.RuntimeException:

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/user/matt/pigsample_24118161_1282155871461

                at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner

s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)

                at

org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

                at

org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:

117)

                at

org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:

527)

                at

org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)

                at

org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

                at

org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:

Input path does not exist:

file:/user/matt/pigsample_24118161_1282155871461

                at

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp

utFormat.java:224)

                at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu

tFormat.listStatus(PigFileInputFormat.java:37)

                at

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu

tFormat.java:241)

                at

org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)

                at

org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)

                at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner

s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)

                ... 6 more

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB