Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Misplaced pigsample_123456.... file fails the pig job !


Copy link to this message
-
Misplaced pigsample_123456.... file fails the pig job !
Hi folks,
I am facing a wiered issue.
I am running PIG 0.11 on windows7/64 bit machine with latest version of
cygwin.

I am a weblog which I want to order it by userName to have all the user
activities for the same user together to feed for next line of processing.

I am starting commandprompt -> cygwin.bat -> on the cygwin console go to
D:/ -> pig and typing the following script on grunt shall (local mode).
(Note I've set PIG_HOME, PIG_CLASSPATH correctly).

Script is :
USERACTIVITIES = LOAD '/D:/path/of/logs/useractivities' USING
org.apache.pig.piggybank.storage.CSVExcelStorage(',') AS
(datetimeUnProcessed:chararray, username:chararray, request:chararray);
USERACTIVITIES_ORDERED = ORDER USERACTIVITIES by username;
STORE USERACTIVITIES_ORDERED INTO '/D:/readyfornextinput/useractivities'
USING org.apache.pig.piggybank.storage.CSVExcelStorage(',');

When I do illustrate USERACTIVITIES_ORDERED I see it going smooth.
But when I do store/dump I face wiered issue.

It fails by saying :
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/D:/pigsample_1749383998_1377684507424

When I tried to search this pigsample_number file I could find that in :
D:/tmp/<username>/mapred/local/localRunner

I am not sure how it is happening.
I am not sure if its windows/cygwin related issue or someone saw this on
Linux also.

For reference, you can find the stacktrace attached here:
 2013-08-28 15:38:28,863 [Thread-46] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/D:/pigsample_1749383998_1377684507424
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
        at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
        at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist: file:/D:/pigsample_1288777582_1377684802262
        at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
        at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
        at
org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
        at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:126)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
        ... 6 more

Any help on this will be useful.

Regards,
Darpan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB