Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig 0.9.2 and avro on S3


Copy link to this message
-
Re: Pig 0.9.2 and avro on S3
A couple of weeks ago I spent a bunch of time trying to get EMR + S3 + Avro
working:
https://forums.aws.amazon.com/thread.jspa?messageID=398194񡍲

Short story, yes I think PIG-2540 is the issue.  I'm currently trying to
get pig 0.10 running in EMR with help from AWS support.   You have to do:
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/run-if --args
"instance.isMaster=true,s3://yourbucket/path/install_pig_0.10.0.sh"

install_pig_0.10.0.sh contents:
---------------------
#!/usr/bin/env bash
cd /home/hadoop
wget http://apache.mirrors.hoobly.com/pig/pig-0.10.0/pig-0.10.0.tar.gz
tar zxf pig-0.10.0.tar.gz
mv pig-0.10.0 pig
echo "export HADOOP_HOME=/home/hadoop" >> ~/.bashrc
echo "export PATH=/home/hadoop/pig/bin/:\$PATH" >> ~/.bashrc
cd pig
ant
cd contrib/piggybank/java
ant
cp piggybank.jar /home/hadoop/lib/.
cd /home/hadoop/lib
wget "http://json-simple.googlecode.com/files/json_simple-1.1.jar"
------------------

But note, I have NOT got around to testing this yet!   If you do, and it
works let me know :-)

will

On Fri, Nov 30, 2012 at 4:05 PM, meghana narasimhan <
[EMAIL PROTECTED]> wrote:

> Oh I should also mention piggybank : 0.9.2-cdh4.0.1
>
>
> On Fri, Nov 30, 2012 at 12:59 PM, meghana narasimhan <
> [EMAIL PROTECTED]> wrote:
>
> > Hi all,
> >
> > Is this bug https://issues.apache.org/jira/browse/PIG-2540 applicable to
> > plain ec2 instances as well. I seem to have hit a snag with Apache Pig
> > version 0.9.2-cdh4.0.1 (rexported) and avro files on S3. My hadoop
> cluster
> > is made of Amazon ec2 instances.
> >
> > Here is my load statement :
> >
> > dimRad = LOAD 's3n://credentials@bucket
> /dimensions/2012/11/29/20121129-000159123456/dim'
> > USING
> >   AVRO_STORAGE AS
> >    (a:int
> >   , b:chararray
> >   );
> >
> > and it gives me a :
> >
> > 2012-11-30 20:42:44,205 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1200: Wrong FS: s3n://credentials@bucket
> /dimensions/2012/11/29/20121129-000159123456/dim,
> > expected: hdfs://ec2-1xxxx.compute-1.amazonaws.com:8020
> >
> >
> > Thanks,
> > Meg
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB