Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> AWS MapReduce


Copy link to this message
-
Re: AWS MapReduce
On Mon, Mar 5, 2012 at 7:40 AM, John Conwell <[EMAIL PROTECTED]> wrote:

> AWS MapReduce (EMR) does not use S3 for its HDFS persistance.  If it did
> your S3 billing would be massive :)  EMR reads all input jar files and
> input data from S3, but it copies these files down to its local disk.  It
> then does starts the MR process, doing all HDFS reads and writes to the
> local disks.  At the end of the MR job, it copies the MR job output and all
> process logs to S3, and then tears down the VM instances.
>
> You can see this for yourself if you spin up a small EMR cluster, but turn
> off the configuration flag that kills the VMs at the end if the MR job.
>  Then look at the hadoop configuration files to see how hadoop is
> configured.
>
> I really like EMR.  Amazon  has done a lot of work to optimize the hadoop
> configurations and VM instance AMIs to execute MR jobs fairly efficiently
> on a VM cluster.  I had to do a lot of (expensive) trial and error work to
> figure out an optimal hadoop / VM configuration to run our MR jobs without
> crashing / timing out the jobs.  The only reason we didnt standardize on
> EMR was that it strongly bound your code base / process to using EMR for
> hadoop processing, vs a flexible infrastructure that could use a local
> cluster or cluster on a different cloud provider.
>
> Thanks for your input. I am assuming HDFS is created on ephemerial disks
and not EBS. Also, is it possible to share some of your findings?

>
> On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
>
> > As far as I see in the docs it looks like you could also use hdfs instead
> > of s3. But what I am not sure is if these are local disks or EBS.
> >
> > On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > Hi,
> > >
> > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow.
> > > The setup is done pretty fast and there are some configuration
> parameters
> > > you can bypass - for example blocksizes etc. - but in the end imho
> > setting
> > > up ec2 instances by copying images is the better alternative.
> > >
> > > Kind Regards
> > >
> > > Hannes
> > >
> > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > I think found answer to this question. However, it's still not clear
> if
> > > > HDFS is on local disk or EBS volumes. Does anyone know?
> > > >
> > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Just want to check  how many are using AWS mapreduce and understand
> > the
> > > > > pros and cons of Amazon's MapReduce machines? Is it true that these
> > map
> > > > > reduce machines are really reading and writing from S3 instead of
> > local
> > > > > disks? Has anyone found issues with Amazon MapReduce and how does
> it
> > > > > compare with using MapReduce on local attached disks compared to
> > using
> > > > S3.
> > > >
> > >
> > > ---
> > > www.informera.de
> > > Hadoop & Big Data Services
> > >
> >
>
>
>
> --
>
> Thanks,
> John C
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB