Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> PigStorageSchema and S3 bug


Copy link to this message
-
PigStorageSchema and S3 bug
Hello,

We are using PigStorageSchema to store our results on S3 with HDFS still as
the file system and we are running into issues writing out the schema file
to s3.

We are just loading a CSV file using PigStorage, running through some basic
Pig stuff and then storing it on S3 using PigStorageSchema. We are on
Hadoop 2.0.0-cdh4.1.0 and Apache Pig version 0.10.0-cdh4.1.0.

{code}

A = LOAD 'input' USING PigStorage(',');

B = FOREACH A GENERATE $0 AS A1, $1 AS A2, $2 AS A3;

C = LIMIT B 3;

STORE C INTO 's3n://XXX:XXX@bucket/outPigStorageSchema1' USING
org.apache.pig.piggybank.storage.PigStorageSchema();

{code}

Pig logs :

{code}

2012-10-11 21:00:56,193 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: LIMIT

2012-10-11 21:00:56,209 [main] INFO
org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned
for A: $3, $4, $5, $6

2012-10-11 21:00:56,250 [main] WARN
org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
'/Meg/outPigStorageSchema1' - Unexpected response code 404, expected 200

2012-10-11 21:00:57,174 [main] WARN
org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
'/Meg/outPigStorageSchema1_%24folder%24' - Unexpected response code 404,
expected 200

2012-10-11 21:00:57,212 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false

2012-10-11 21:00:57,218 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1

2012-10-11 21:00:57,218 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1

2012-10-11 21:00:57,221 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job

2012-10-11 21:00:57,221 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2012-10-11 21:00:57,222 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- creating jar file Job7469072732967367765.jar

2012-10-11 21:01:02,810 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- jar file Job7469072732967367765.jar created

2012-10-11 21:01:02,815 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job2012-10-11 21:01:02,830 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.

2012-10-11 21:01:02,884 [Thread-64] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.

2012-10-11 21:01:03,256 [Thread-64] WARN
org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
'/Meg/outPigStorageSchema1' - Unexpected response code 404, expected 200

2012-10-11 21:01:03,332 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete

2012-10-11 21:01:03,502 [Thread-64] WARN
org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
'/Meg/outPigStorageSchema1_%24folder%24' - Unexpected response code 404,
expected 200

2012-10-11 21:01:03,563 [Thread-64] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1

2012-10-11 21:01:03,563 [Thread-64] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
inputpaths to process : 1

2012-10-11 21:01:03,565 [Thread-64] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1

2012-10-11 21:01:04,488 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201210052302_0065

2012-10-11 21:01:04,489 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://ec2-184-72-197-101.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201210052302_0065

2012-10-11 21:01:17,679 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete

2012-10-11 21:02:24,236 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201210052302_0065 has failed! Stop running all dependent jobs

2012-10-11 21:02:24,237 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete

2012-10-11 21:02:24,244 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!

2012-10-11 21:02:24,244 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersion

UserId StartedAt

FinishedAt Features

2.0.0-cdh4.0.1 0.10.0-cdh4.1.0

whirr 2012-10-11 21:00:57

2012-10-11 21:02:24  LIMIT

Failed!

Failed Jobs:

JobId Alias

Feature Message

Outputs

job_201210052302_0065  A,B,C Message: Job failed!

s3n://AKIAJFE3KBQKC5CKRFPA:3Ss5Ib9PpaWZJz7BTUhXnmW6nLPyC26b+pRWNmhj@Meg
/outPigStorageSchema1,

Input(s):

Failed to read data from "hdfs://ec2-184-72-197-101.
compute-1.amazonaws.com/user/whirr/incite/site_clicks_spend_by_hour/2012/07/10/20121008-205839320852/part-r-00000
"

Output(s):

Failed to produce result in "s3n://XXX:XXX@Meg/outPigStorageSchema1"

Counters:

Total records written : 0

Total bytes written : 0

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

Job DAG:

job_201210052302_0065

2012-10-11 21:02:24,244 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!

2012-10-11 21:02:24,321 [main] WARN
org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
'/Meg/outPigStorageSchema1' - Unexpected response code 404, expected 200