Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Best practices for HBase in EC2?


+
Jim R. Wilson 2011-06-04, 18:27
+
Sean Bigdatafun 2011-06-04, 18:40
+
Jim R. Wilson 2011-06-04, 18:49
Copy link to this message
-
Re: Best practices for HBase in EC2?
Himanshu Vashishtha 2011-06-04, 19:02
I used ec2, but just for experiments. Here is what I did:
a) used the ephemeral disks. My experiment datasets were persisted on S3,
and I  copied them onto the cluster.
b) Use the hbase-ec2 scripts. get this repo
https://github.com/ekoontz/hbase-ec2.git.
c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf

For the AMI, there is a create-hbase-image script in the above git repo. I
did create for my stuff and it's public, search "himanshu-hbase" and you
should get it. But it's always good to have your own AMI (I learned it the
hard way).

Consult the run scripts, like bin/launch-hbase-cluster,
bin/launch-hbase-master etc.
One thing was when you run the launch-cluster, the cluster is all set but I
needed to manually add the regionserver's internal ip in the master's
conf/regionserver list. And also the datanode's entry in the conf/slaves if
hadoop directory. This can be done by a script though.

Hope this helps.
Himanshu

On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote:

Thanks Sean,
>
> That's helpful.  I probably should have added some contextual info.  In my
> case, I'm interested in providing instructions on how one can fire up an
> HBase cluster in EC2 order to experiment with it.  That is, load data,
> practice administration, etc.  In that context, it's unlikely that the
> person following the instructions would start more that 5 nodes, and would
> also not likely keep them on longer than an hour.
>
> I saw archived email threads where people recommended not running on EC2
> for
> any length of time since you can get better performance-per-cost
> characteristics from dedicated hardware (for example from Rackspace).
>
> So I guess my real question is this: What is the easiest possible way to
> start a 5-node HBase 0.90.x cluster in EC2?  I'm thinking that S3 is better
> for storage, but I'm open to whatever is genuinely the easiest thing to do.
>
> Thanks again,
>
> -- Jim
>
> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun
> <[EMAIL PROTECTED]>wrote:
>
> > Here is my thoughts:
> >
> > If your datastorage is used for long-term, then you may consider
> attaching
> > HDFS storage device onto EBS rather than local disk (Attaching Namenode
> > storage device onto EBS as well). But for this setup, I think we should
> > think of dfs.replication.factor=2 (even 1) because EBS itself has already
> > provided enough reliability.
> >
> > If your datastore is used for ephemeral purpose (say EMR computation),
> you
> > may consider just using the EC2 provided ephemeral disks.
> >
> >
> >
> >
> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi HBase community,
> > >
> > > What are the current best-practices with respect to starting up an
> HBase
> > > cluster in EC2?  I don't see any public AMI's newer than 0.89.xxx, and
> > > starting up that one it's, clear that it's not configured for HDFS or
> > > clustering (empty hbase-site.xml).
> > >
> > > Do people generally keep data in S3 or HDFS?  If the latter, is it
> > > persisted
> > > via EBS?  Do the hadoop nodes have more than one EBS attached to
> > > distinguish
> > > HDFS from the OS?
> > >
> > > Any help is much appreciated.  Thanks in advance!
> > >
> > > -- Jim R. Wilson (jimbojw)
> > >
> >
> >
> >
> > --
> > --Sean
> >
>
+
Himanshu Vashishtha 2011-06-04, 19:16
+
Jim R. Wilson 2011-06-04, 19:25
+
Andrew Purtell 2011-06-04, 20:30
+
Jim R. Wilson 2011-06-05, 01:48
+
Dave Viner 2011-06-05, 05:01
+
George P. Stathis 2011-06-09, 01:21
+
Gaurav Kohli 2011-06-09, 05:59
+
Himanshu Vashishtha 2011-06-23, 21:33