Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Misplaced pigsample_123456.... file fails the pig job !


Copy link to this message
-
Re: Misplaced pigsample_123456.... file fails the pig job !
Which hadoop distro are you using? I've heard Hortonworks has a
windows-compatible hadoop.
On Wed, Aug 28, 2013 at 2:36 PM, Darpan R <[EMAIL PROTECTED]> wrote:

> Hi folks,
> I am facing a wiered issue.
> I am running PIG 0.11 on windows7/64 bit machine with latest version of
> cygwin.
>
> I am a weblog which I want to order it by userName to have all the user
> activities for the same user together to feed for next line of processing.
>
> I am starting commandprompt -> cygwin.bat -> on the cygwin console go to
> D:/ -> pig and typing the following script on grunt shall (local mode).
> (Note I've set PIG_HOME, PIG_CLASSPATH correctly).
>
> Script is :
> USERACTIVITIES = LOAD '/D:/path/of/logs/useractivities' USING
> org.apache.pig.piggybank.storage.CSVExcelStorage(',') AS
> (datetimeUnProcessed:chararray, username:chararray, request:chararray);
> USERACTIVITIES_ORDERED = ORDER USERACTIVITIES by username;
> STORE USERACTIVITIES_ORDERED INTO '/D:/readyfornextinput/useractivities'
> USING org.apache.pig.piggybank.storage.CSVExcelStorage(',');
>
> When I do illustrate USERACTIVITIES_ORDERED I see it going smooth.
> But when I do store/dump I face wiered issue.
>
> It fails by saying :
> java.lang.RuntimeException:
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/D:/pigsample_1749383998_1377684507424
>
> When I tried to search this pigsample_number file I could find that in :
> D:/tmp/<username>/mapred/local/localRunner
>
> I am not sure how it is happening.
> I am not sure if its windows/cygwin related issue or someone saw this on
> Linux also.
>
> For reference, you can find the stacktrace attached here:
>  2013-08-28 15:38:28,863 [Thread-46] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
> java.lang.RuntimeException:
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/D:/pigsample_1749383998_1377684507424
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
>         at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>         at
>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>         at
>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
> Input path does not exist: file:/D:/pigsample_1288777582_1377684802262
>         at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
>         at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>         at
> org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
>         at
> org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:126)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
>         ... 6 more
>
> Any help on this will be useful.
>
> Regards,
> Darpan
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB