|
Mohit Anchlia
2012-03-03, 23:54
Mohit Anchlia
2012-03-04, 01:31
Hannes Carl Meyer
2012-03-04, 10:27
Mohit Anchlia
2012-03-04, 16:51
John Conwell
2012-03-05, 15:40
Mohit Anchlia
2012-03-05, 17:29
|
-
AWS MapReduceMohit Anchlia 2012-03-03, 23:54
Just want to check how many are using AWS mapreduce and understand the
pros and cons of Amazon's MapReduce machines? Is it true that these map reduce machines are really reading and writing from S3 instead of local disks? Has anyone found issues with Amazon MapReduce and how does it compare with using MapReduce on local attached disks compared to using S3.
-
Re: AWS MapReduceMohit Anchlia 2012-03-04, 01:31
I think found answer to this question. However, it's still not clear if
HDFS is on local disk or EBS volumes. Does anyone know? On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Just want to check how many are using AWS mapreduce and understand the > pros and cons of Amazon's MapReduce machines? Is it true that these map > reduce machines are really reading and writing from S3 instead of local > disks? Has anyone found issues with Amazon MapReduce and how does it > compare with using MapReduce on local attached disks compared to using S3.
-
Re: AWS MapReduceHannes Carl Meyer 2012-03-04, 10:27
Hi,
yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. The setup is done pretty fast and there are some configuration parameters you can bypass - for example blocksizes etc. - but in the end imho setting up ec2 instances by copying images is the better alternative. Kind Regards Hannes On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > I think found answer to this question. However, it's still not clear if > HDFS is on local disk or EBS volumes. Does anyone know? > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia <[EMAIL PROTECTED] > >wrote: > > > Just want to check how many are using AWS mapreduce and understand the > > pros and cons of Amazon's MapReduce machines? Is it true that these map > > reduce machines are really reading and writing from S3 instead of local > > disks? Has anyone found issues with Amazon MapReduce and how does it > > compare with using MapReduce on local attached disks compared to using > S3. > --- www.informera.de Hadoop & Big Data Services
-
Re: AWS MapReduceMohit Anchlia 2012-03-04, 16:51
As far as I see in the docs it looks like you could also use hdfs instead
of s3. But what I am not sure is if these are local disks or EBS. On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer <[EMAIL PROTECTED] > wrote: > Hi, > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. > The setup is done pretty fast and there are some configuration parameters > you can bypass - for example blocksizes etc. - but in the end imho setting > up ec2 instances by copying images is the better alternative. > > Kind Regards > > Hannes > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <[EMAIL PROTECTED] > >wrote: > > > I think found answer to this question. However, it's still not clear if > > HDFS is on local disk or EBS volumes. Does anyone know? > > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia <[EMAIL PROTECTED] > > >wrote: > > > > > Just want to check how many are using AWS mapreduce and understand the > > > pros and cons of Amazon's MapReduce machines? Is it true that these map > > > reduce machines are really reading and writing from S3 instead of local > > > disks? Has anyone found issues with Amazon MapReduce and how does it > > > compare with using MapReduce on local attached disks compared to using > > S3. > > > > --- > www.informera.de > Hadoop & Big Data Services >
-
Re: AWS MapReduceJohn Conwell 2012-03-05, 15:40
AWS MapReduce (EMR) does not use S3 for its HDFS persistance. If it did
your S3 billing would be massive :) EMR reads all input jar files and input data from S3, but it copies these files down to its local disk. It then does starts the MR process, doing all HDFS reads and writes to the local disks. At the end of the MR job, it copies the MR job output and all process logs to S3, and then tears down the VM instances. You can see this for yourself if you spin up a small EMR cluster, but turn off the configuration flag that kills the VMs at the end if the MR job. Then look at the hadoop configuration files to see how hadoop is configured. I really like EMR. Amazon has done a lot of work to optimize the hadoop configurations and VM instance AMIs to execute MR jobs fairly efficiently on a VM cluster. I had to do a lot of (expensive) trial and error work to figure out an optimal hadoop / VM configuration to run our MR jobs without crashing / timing out the jobs. The only reason we didnt standardize on EMR was that it strongly bound your code base / process to using EMR for hadoop processing, vs a flexible infrastructure that could use a local cluster or cluster on a different cloud provider. On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > As far as I see in the docs it looks like you could also use hdfs instead > of s3. But what I am not sure is if these are local disks or EBS. > > On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer < > [EMAIL PROTECTED] > > wrote: > > > Hi, > > > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. > > The setup is done pretty fast and there are some configuration parameters > > you can bypass - for example blocksizes etc. - but in the end imho > setting > > up ec2 instances by copying images is the better alternative. > > > > Kind Regards > > > > Hannes > > > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <[EMAIL PROTECTED] > > >wrote: > > > > > I think found answer to this question. However, it's still not clear if > > > HDFS is on local disk or EBS volumes. Does anyone know? > > > > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Just want to check how many are using AWS mapreduce and understand > the > > > > pros and cons of Amazon's MapReduce machines? Is it true that these > map > > > > reduce machines are really reading and writing from S3 instead of > local > > > > disks? Has anyone found issues with Amazon MapReduce and how does it > > > > compare with using MapReduce on local attached disks compared to > using > > > S3. > > > > > > > --- > > www.informera.de > > Hadoop & Big Data Services > > > -- Thanks, John C
-
Re: AWS MapReduceMohit Anchlia 2012-03-05, 17:29
On Mon, Mar 5, 2012 at 7:40 AM, John Conwell <[EMAIL PROTECTED]> wrote:
> AWS MapReduce (EMR) does not use S3 for its HDFS persistance. If it did > your S3 billing would be massive :) EMR reads all input jar files and > input data from S3, but it copies these files down to its local disk. It > then does starts the MR process, doing all HDFS reads and writes to the > local disks. At the end of the MR job, it copies the MR job output and all > process logs to S3, and then tears down the VM instances. > > You can see this for yourself if you spin up a small EMR cluster, but turn > off the configuration flag that kills the VMs at the end if the MR job. > Then look at the hadoop configuration files to see how hadoop is > configured. > > I really like EMR. Amazon has done a lot of work to optimize the hadoop > configurations and VM instance AMIs to execute MR jobs fairly efficiently > on a VM cluster. I had to do a lot of (expensive) trial and error work to > figure out an optimal hadoop / VM configuration to run our MR jobs without > crashing / timing out the jobs. The only reason we didnt standardize on > EMR was that it strongly bound your code base / process to using EMR for > hadoop processing, vs a flexible infrastructure that could use a local > cluster or cluster on a different cloud provider. > > Thanks for your input. I am assuming HDFS is created on ephemerial disks and not EBS. Also, is it possible to share some of your findings? > > On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia <[EMAIL PROTECTED] > >wrote: > > > As far as I see in the docs it looks like you could also use hdfs instead > > of s3. But what I am not sure is if these are local disks or EBS. > > > > On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer < > > [EMAIL PROTECTED] > > > wrote: > > > > > Hi, > > > > > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. > > > The setup is done pretty fast and there are some configuration > parameters > > > you can bypass - for example blocksizes etc. - but in the end imho > > setting > > > up ec2 instances by copying images is the better alternative. > > > > > > Kind Regards > > > > > > Hannes > > > > > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <[EMAIL PROTECTED] > > > >wrote: > > > > > > > I think found answer to this question. However, it's still not clear > if > > > > HDFS is on local disk or EBS volumes. Does anyone know? > > > > > > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia < > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > Just want to check how many are using AWS mapreduce and understand > > the > > > > > pros and cons of Amazon's MapReduce machines? Is it true that these > > map > > > > > reduce machines are really reading and writing from S3 instead of > > local > > > > > disks? Has anyone found issues with Amazon MapReduce and how does > it > > > > > compare with using MapReduce on local attached disks compared to > > using > > > > S3. > > > > > > > > > > --- > > > www.informera.de > > > Hadoop & Big Data Services > > > > > > > > > -- > > Thanks, > John C > |