-Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines
How are you guys moving 100 TB into the AWS cloud? Are you using S3 or
EBS? If you are using S3, it does not work like HDFS. Although data is
replicated (I believe within an availability zone) in S3, it is not
the same as HDFS replication. You lose the data locality optimization
feature of Hadoop when you use S3, which runs counter to the "sending
code to data" paradigm of MapReduce. Mind you, traffic in/out of S3
equates to costs incurred as well (when you lose data locality
I hear that to get PBs worth of data into AWS, it is not uncommon to
drive a truck with your data on some physical storage device (in fact,
Amazon will help you do this).
Please update us, this is an interesting problem.
On Thu, May 31, 2012 at 2:41 PM, Sandeep Reddy P
<[EMAIL PROTECTED]> wrote:
> We are getting 100TB of data with replication factor of 3 this goes to
> 300TB of data. We are planning to use hadoop with 65nodes. We want to know
> which option will be better in terms of hardware either physical Machines
> or deploy hadoop on EC2. Is there any document that supports use of
> physical machines.
> Hardware specs: 2 quad core cpu, 32 Gb Ram, 12*1 Tb hard drives , 10Gb
> Ethernet Switches costs $10k for each machine. Is that cheaper to use EC2
> ?? will there be any performance issues??