|
|
-
PigStorageSchema and S3 bugmeghana narasimhan 2012-10-12, 21:43
Hello,
We are using PigStorageSchema to store our results on S3 with HDFS still as the file system and we are running into issues writing out the schema file to s3. We are just loading a CSV file using PigStorage, running through some basic Pig stuff and then storing it on S3 using PigStorageSchema. We are on Hadoop 2.0.0-cdh4.1.0 and Apache Pig version 0.10.0-cdh4.1.0. {code} A = LOAD 'input' USING PigStorage(','); B = FOREACH A GENERATE $0 AS A1, $1 AS A2, $2 AS A3; C = LIMIT B 3; STORE C INTO 's3n://XXX:XXX@bucket/outPigStorageSchema1' USING org.apache.pig.piggybank.storage.PigStorageSchema(); {code} Pig logs : {code} 2012-10-11 21:00:56,193 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT 2012-10-11 21:00:56,209 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for A: $3, $4, $5, $6 2012-10-11 21:00:56,250 [main] WARN org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/Meg/outPigStorageSchema1' - Unexpected response code 404, expected 200 2012-10-11 21:00:57,174 [main] WARN org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/Meg/outPigStorageSchema1_%24folder%24' - Unexpected response code 404, expected 200 2012-10-11 21:00:57,212 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2012-10-11 21:00:57,218 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2012-10-11 21:00:57,218 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2012-10-11 21:00:57,221 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2012-10-11 21:00:57,221 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2012-10-11 21:00:57,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7469072732967367765.jar 2012-10-11 21:01:02,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7469072732967367765.jar created 2012-10-11 21:01:02,815 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job2012-10-11 21:01:02,830 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2012-10-11 21:01:02,884 [Thread-64] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2012-10-11 21:01:03,256 [Thread-64] WARN org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/Meg/outPigStorageSchema1' - Unexpected response code 404, expected 200 2012-10-11 21:01:03,332 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2012-10-11 21:01:03,502 [Thread-64] WARN org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/Meg/outPigStorageSchema1_%24folder%24' - Unexpected response code 404, expected 200 2012-10-11 21:01:03,563 [Thread-64] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2012-10-11 21:01:03,563 [Thread-64] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total inputpaths to process : 1 2012-10-11 21:01:03,565 [Thread-64] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2012-10-11 21:01:04,488 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201210052302_0065 2012-10-11 21:01:04,489 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://ec2-184-72-197-101.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201210052302_0065 2012-10-11 21:01:17,679 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2012-10-11 21:02:24,236 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201210052302_0065 has failed! Stop running all dependent jobs 2012-10-11 21:02:24,237 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2012-10-11 21:02:24,244 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2012-10-11 21:02:24,244 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.0.0-cdh4.0.1 0.10.0-cdh4.1.0 whirr 2012-10-11 21:00:57 2012-10-11 21:02:24 LIMIT Failed! Failed Jobs: JobId Alias Feature Message Outputs job_201210052302_0065 A,B,C Message: Job failed! s3n://AKIAJFE3KBQKC5CKRFPA:3Ss5Ib9PpaWZJz7BTUhXnmW6nLPyC26b+pRWNmhj@Meg /outPigStorageSchema1, Input(s): Failed to read data from "hdfs://ec2-184-72-197-101. compute-1.amazonaws.com/user/whirr/incite/site_clicks_spend_by_hour/2012/07/10/20121008-205839320852/part-r-00000 " Output(s): Failed to produce result in "s3n://XXX:XXX@Meg/outPigStorageSchema1" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_201210052302_0065 2012-10-11 21:02:24,244 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2012-10-11 21:02:24,321 [main] WARN org.jets3t.service.impl.rest.httpclient.RestS3Service - Response '/Meg/outPigStorageSchema1' - Unexpected response code 404, expected 200 |