|
|
-
order by throwing exception in cluster
Irooniam 2011-05-13, 19:37
Hello,
I'm running into a weird problem that I'm hoping you can help me with.
I'm basically just loading a access log, grouping, ordering and then dumping the data.
I can load the log, group and order when I'm in local mode, but when I try to do the same in the hadoop cluster I always get a error with the 'order by' clause.
Here's the relevant bits:
REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar define logloader org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader();
logs = LOAD '/logs/2011/05/12/16/localhost.access.log_hadoop01_2011-05-12_16-18-30.log' using logloader as (remoteHost:CHARARRAY, hyphen:CHARARRAY, hyphen2:CHARARRAY, time:CHARARRAY, method:CHARARRAY, uri:CHARARRAY, protocol:CHARARRAY, statusCode:CHARARRAY, responseSize:CHARARRAY, treferer:CHARARRAY, agent:CHARARRAY);
grp = GROUP logs BY treferer;
out = FOREACH grp GENERATE group, COUNT($1) as ref_cnt;
out2 = ORDER out BY ref_cnt;
dump out2;
In the cluster I get the following: java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/hdfs/pigsample_1861447257_1305315373876 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:559) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/hdfs/pigsample_1861447257_1305315373876 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153) at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112) ... 6 more
....
2011-05-13 12:36:19,386 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs 2011-05-13 12:36:19,429 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias out2 Any help on this would be appreciated.
-
Re: order by throwing exception in cluster
Thejas M Nair 2011-05-13, 23:13
The exception stack has LocalJobRunner, that is strange. Have you specified the cmd line option "-x mapreduce" ? Is the hadoop conf dir in class path? -Thejas
On 5/13/11 12:37 PM, "Irooniam" <[EMAIL PROTECTED]> wrote:
Hello,
I'm running into a weird problem that I'm hoping you can help me with.
I'm basically just loading a access log, grouping, ordering and then dumping the data.
I can load the log, group and order when I'm in local mode, but when I try to do the same in the hadoop cluster I always get a error with the 'order by' clause.
..
.java:117) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:559) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: --
|
|