Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Misplaced pigsample_123456.... file fails the pig job !


Copy link to this message
-
Re: Misplaced pigsample_123456.... file fails the pig job !
Which hadoop distro are you using? I've heard Hortonworks has a
windows-compatible hadoop.
On Wed, Aug 28, 2013 at 2:36 PM, Darpan R <[EMAIL PROTECTED]> wrote:

> Hi folks,
> I am facing a wiered issue.
> I am running PIG 0.11 on windows7/64 bit machine with latest version of
> cygwin.
>
> I am a weblog which I want to order it by userName to have all the user
> activities for the same user together to feed for next line of processing.
>
> I am starting commandprompt -> cygwin.bat -> on the cygwin console go to
> D:/ -> pig and typing the following script on grunt shall (local mode).
> (Note I've set PIG_HOME, PIG_CLASSPATH correctly).
>
> Script is :
> USERACTIVITIES = LOAD '/D:/path/of/logs/useractivities' USING
> org.apache.pig.piggybank.storage.CSVExcelStorage(',') AS
> (datetimeUnProcessed:chararray, username:chararray, request:chararray);
> USERACTIVITIES_ORDERED = ORDER USERACTIVITIES by username;
> STORE USERACTIVITIES_ORDERED INTO '/D:/readyfornextinput/useractivities'
> USING org.apache.pig.piggybank.storage.CSVExcelStorage(',');
>
> When I do illustrate USERACTIVITIES_ORDERED I see it going smooth.
> But when I do store/dump I face wiered issue.
>
> It fails by saying :
> java.lang.RuntimeException:
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/D:/pigsample_1749383998_1377684507424
>
> When I tried to search this pigsample_number file I could find that in :
> D:/tmp/<username>/mapred/local/localRunner
>
> I am not sure how it is happening.
> I am not sure if its windows/cygwin related issue or someone saw this on
> Linux also.
>
> For reference, you can find the stacktrace attached here:
>  2013-08-28 15:38:28,863 [Thread-46] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
> java.lang.RuntimeException:
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/D:/pigsample_1749383998_1377684507424
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
>         at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>         at
>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>         at
>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
> Input path does not exist: file:/D:/pigsample_1288777582_1377684802262
>         at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
>         at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>         at
> org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
>         at
> org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:126)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
>         ... 6 more
>
> Any help on this will be useful.
>
> Regards,
> Darpan
>