Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - running pig on amazon ec2


Copy link to this message
-
running pig on amazon ec2
Dexin Wang 2011-06-13, 18:54
Hi,

This is probably not directly a Pig question.

Anyone running Pig on amazon EC2 instances? Something's not making sense to
me. I ran a Pig script that has about 10 mapred jobs in it on a 16 node
cluster using m1.small. It took *13 minutes*. The job reads input from S3
and writes output to S3. But from the logs the reading and writing part
to/from S3 is pretty fast. And all the intermediate steps should happen on
HDFS.

Running the same job on my mbp laptop, it only took *3 minutes*.

Amazon is using pig0.6 while I'm using pig 0.8 on laptop. I'll try Pig 0.6
on my laptop. Some hadoop config is probably also not ideal. I tried
m1.large instead of m1.small, doesn't seem to make a huge difference.
Anything you would suggest to look for the slowness on EC2?

Dexin