Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> running pig on amazon ec2


Copy link to this message
-
running pig on amazon ec2
Hi,

This is probably not directly a Pig question.

Anyone running Pig on amazon EC2 instances? Something's not making sense to
me. I ran a Pig script that has about 10 mapred jobs in it on a 16 node
cluster using m1.small. It took *13 minutes*. The job reads input from S3
and writes output to S3. But from the logs the reading and writing part
to/from S3 is pretty fast. And all the intermediate steps should happen on
HDFS.

Running the same job on my mbp laptop, it only took *3 minutes*.

Amazon is using pig0.6 while I'm using pig 0.8 on laptop. I'll try Pig 0.6
on my laptop. Some hadoop config is probably also not ideal. I tried
m1.large instead of m1.small, doesn't seem to make a huge difference.
Anything you would suggest to look for the slowness on EC2?

Dexin
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB