Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Best practices for HBase in EC2?


Copy link to this message
-
Re: Best practices for HBase in EC2?
I used ec2, but just for experiments. Here is what I did:
a) used the ephemeral disks. My experiment datasets were persisted on S3,
and I  copied them onto the cluster.
b) Use the hbase-ec2 scripts. get this repo
https://github.com/ekoontz/hbase-ec2.git.
c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf

For the AMI, there is a create-hbase-image script in the above git repo. I
did create for my stuff and it's public, search "himanshu-hbase" and you
should get it. But it's always good to have your own AMI (I learned it the
hard way).

Consult the run scripts, like bin/launch-hbase-cluster,
bin/launch-hbase-master etc.
One thing was when you run the launch-cluster, the cluster is all set but I
needed to manually add the regionserver's internal ip in the master's
conf/regionserver list. And also the datanode's entry in the conf/slaves if
hadoop directory. This can be done by a script though.

Hope this helps.
Himanshu

On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote:

Thanks Sean,
>
> That's helpful.  I probably should have added some contextual info.  In my
> case, I'm interested in providing instructions on how one can fire up an
> HBase cluster in EC2 order to experiment with it.  That is, load data,
> practice administration, etc.  In that context, it's unlikely that the
> person following the instructions would start more that 5 nodes, and would
> also not likely keep them on longer than an hour.
>
> I saw archived email threads where people recommended not running on EC2
> for
> any length of time since you can get better performance-per-cost
> characteristics from dedicated hardware (for example from Rackspace).
>
> So I guess my real question is this: What is the easiest possible way to
> start a 5-node HBase 0.90.x cluster in EC2?  I'm thinking that S3 is better
> for storage, but I'm open to whatever is genuinely the easiest thing to do.
>
> Thanks again,
>
> -- Jim
>
> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun
> <[EMAIL PROTECTED]>wrote:
>
> > Here is my thoughts:
> >
> > If your datastorage is used for long-term, then you may consider
> attaching
> > HDFS storage device onto EBS rather than local disk (Attaching Namenode
> > storage device onto EBS as well). But for this setup, I think we should
> > think of dfs.replication.factor=2 (even 1) because EBS itself has already
> > provided enough reliability.
> >
> > If your datastore is used for ephemeral purpose (say EMR computation),
> you
> > may consider just using the EC2 provided ephemeral disks.
> >
> >
> >
> >
> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi HBase community,
> > >
> > > What are the current best-practices with respect to starting up an
> HBase
> > > cluster in EC2?  I don't see any public AMI's newer than 0.89.xxx, and
> > > starting up that one it's, clear that it's not configured for HDFS or
> > > clustering (empty hbase-site.xml).
> > >
> > > Do people generally keep data in S3 or HDFS?  If the latter, is it
> > > persisted
> > > via EBS?  Do the hadoop nodes have more than one EBS attached to
> > > distinguish
> > > HDFS from the OS?
> > >
> > > Any help is much appreciated.  Thanks in advance!
> > >
> > > -- Jim R. Wilson (jimbojw)
> > >
> >
> >
> >
> > --
> > --Sean
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB