Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> PigStorageSchema and S3 bug


+
Meghana Narasimhan 2012-10-12, 20:50
Copy link to this message
-
Re: PigStorageSchema and S3 bug
Hi Meghana,

Are you sure that you're using Apache Pig version 0.10.0-cdh4.1.0? Because
a change was made to PigStorageSchema in Pig 0.10 (
https://issues.apache.org/jira/browse/PIG-2143), it is not possible to get
your call stack:

            at
org.apache.pig.piggybank.storage.PigStorageSchema.storeSchema(PigStorageSchema.java:152)

In Pig 0.10, PigStorageSchema.java is only 45-line long. I double checked
that Pig version 0.10.0-cdh4.1.0 includes PIG-2143. It looks like you're
using an old version of Pig something like 0.9.2-cdh4.0.1.

Thanks,
Cheolsoo

On Fri, Oct 12, 2012 at 1:50 PM, Meghana Narasimhan <
[EMAIL PROTECTED]> wrote:

> Hello,
>
> We are using PigStorageSchema to store our results on S3 with HDFS still
> as the file system and we are running into issues writing out the schema
> file to s3.
>
> We are just loading a CSV file using PigStorage, running through some
> basic Pig stuff and then storing it on S3 using PigStorageSchema. We are on
> Hadoop 2.0.0-cdh4.1.0 and Apache Pig version 0.10.0-cdh4.1.0.
>
>
> {code}
> A = LOAD 'input' USING PigStorage(',');
> B = FOREACH A GENERATE $0 AS A1, $1 AS A2, $2 AS A3;
> C = LIMIT B 3;
> STORE C INTO 's3n://XXX:XXX@bucket/outPigStorageSchema1' USING
> org.apache.pig.piggybank.storage.PigStorageSchema();
> {code}
>
> Pig logs :
>
> {code}
> 2012-10-11 21:00:56,193 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: LIMIT
> 2012-10-11 21:00:56,209 [main] INFO
>  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned
> for A: $3, $4, $5, $6
> 2012-10-11 21:00:56,250 [main] WARN
>  org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
> '/Meg/outPigStorageSchema1' - Unexpected response code 404, expected 200
> 2012-10-11 21:00:57,174 [main] WARN
>  org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
> '/Meg/outPigStorageSchema1_%24folder%24' - Unexpected response code 404,
> expected 200
> 2012-10-11 21:00:57,212 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-10-11 21:00:57,218 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-10-11 21:00:57,218 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2012-10-11 21:00:57,221 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2012-10-11 21:00:57,221 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-10-11 21:00:57,222 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - creating jar file Job7469072732967367765.jar
> 2012-10-11 21:01:02,810 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - jar file Job7469072732967367765.jar created
> 2012-10-11 21:01:02,815 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job2012-10-11 21:01:02,830 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2012-10-11 21:01:02,884 [Thread-64] WARN
>  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing
> the arguments. Applications should implement Tool for the same.
> 2012-10-11 21:01:03,256 [Thread-64] WARN
>  org.jets3t.service.impl.rest.httpclient.RestS3Service - Response
> '/Meg/outPigStorageSchema1' - Unexpected response code 404, expected 200
> 2012-10-11 21:01:03,332 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2012-10-11 21:01:03,502 [Thread-64] WARN
+
meghana narasimhan 2012-10-12, 21:43
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB