Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Misplaced pigsample_123456.... file fails the pig job !


Copy link to this message
-
Misplaced pigsample_123456.... file fails the pig job !
Darpan R 2013-08-28, 10:36
Hi folks,
I am facing a wiered issue.
I am running PIG 0.11 on windows7/64 bit machine with latest version of
cygwin.

I am a weblog which I want to order it by userName to have all the user
activities for the same user together to feed for next line of processing.

I am starting commandprompt -> cygwin.bat -> on the cygwin console go to
D:/ -> pig and typing the following script on grunt shall (local mode).
(Note I've set PIG_HOME, PIG_CLASSPATH correctly).

Script is :
USERACTIVITIES = LOAD '/D:/path/of/logs/useractivities' USING
org.apache.pig.piggybank.storage.CSVExcelStorage(',') AS
(datetimeUnProcessed:chararray, username:chararray, request:chararray);
USERACTIVITIES_ORDERED = ORDER USERACTIVITIES by username;
STORE USERACTIVITIES_ORDERED INTO '/D:/readyfornextinput/useractivities'
USING org.apache.pig.piggybank.storage.CSVExcelStorage(',');

When I do illustrate USERACTIVITIES_ORDERED I see it going smooth.
But when I do store/dump I face wiered issue.

It fails by saying :
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/D:/pigsample_1749383998_1377684507424

When I tried to search this pigsample_number file I could find that in :
D:/tmp/<username>/mapred/local/localRunner

I am not sure how it is happening.
I am not sure if its windows/cygwin related issue or someone saw this on
Linux also.

For reference, you can find the stacktrace attached here:
 2013-08-28 15:38:28,863 [Thread-46] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/D:/pigsample_1749383998_1377684507424
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
        at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
        at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist: file:/D:/pigsample_1288777582_1377684802262
        at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
        at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
        at
org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
        at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:126)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
        ... 6 more

Any help on this will be useful.

Regards,
Darpan