Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Best practices for HBase in EC2?


Copy link to this message
-
Re: Best practices for HBase in EC2?
should add the disclaimer: That this is not the best possible way! :))
There are some ruby scripts too (in the same repo, look for recipes
directory), and your cluster is up and running just with 1 rb file. I didn't
use it because ruby is an unknown territory for me and I was not entirely
clear about it's working.

Himanshu

On Sat, Jun 4, 2011 at 1:02 PM, Himanshu Vashishtha <[EMAIL PROTECTED]
> wrote:

> I used ec2, but just for experiments. Here is what I did:
> a) used the ephemeral disks. My experiment datasets were persisted on S3,
> and I  copied them onto the cluster.
> b) Use the hbase-ec2 scripts. get this repo
> https://github.com/ekoontz/hbase-ec2.git.
> c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
>
> For the AMI, there is a create-hbase-image script in the above git repo. I
> did create for my stuff and it's public, search "himanshu-hbase" and you
> should get it. But it's always good to have your own AMI (I learned it the
> hard way).
>
> Consult the run scripts, like bin/launch-hbase-cluster,
> bin/launch-hbase-master etc.
> One thing was when you run the launch-cluster, the cluster is all set but I
> needed to manually add the regionserver's internal ip in the master's
> conf/regionserver list. And also the datanode's entry in the conf/slaves if
> hadoop directory. This can be done by a script though.
>
> Hope this helps.
> Himanshu
>
>
> On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote:
>
> Thanks Sean,
>>
>> That's helpful.  I probably should have added some contextual info.  In my
>> case, I'm interested in providing instructions on how one can fire up an
>> HBase cluster in EC2 order to experiment with it.  That is, load data,
>> practice administration, etc.  In that context, it's unlikely that the
>> person following the instructions would start more that 5 nodes, and would
>> also not likely keep them on longer than an hour.
>>
>> I saw archived email threads where people recommended not running on EC2
>> for
>> any length of time since you can get better performance-per-cost
>> characteristics from dedicated hardware (for example from Rackspace).
>>
>> So I guess my real question is this: What is the easiest possible way to
>> start a 5-node HBase 0.90.x cluster in EC2?  I'm thinking that S3 is
>> better
>> for storage, but I'm open to whatever is genuinely the easiest thing to
>> do.
>>
>> Thanks again,
>>
>> -- Jim
>>
>> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun
>> <[EMAIL PROTECTED]>wrote:
>>
>> > Here is my thoughts:
>> >
>> > If your datastorage is used for long-term, then you may consider
>> attaching
>> > HDFS storage device onto EBS rather than local disk (Attaching Namenode
>> > storage device onto EBS as well). But for this setup, I think we should
>> > think of dfs.replication.factor=2 (even 1) because EBS itself has
>> already
>> > provided enough reliability.
>> >
>> > If your datastore is used for ephemeral purpose (say EMR computation),
>> you
>> > may consider just using the EC2 provided ephemeral disks.
>> >
>> >
>> >
>> >
>> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > Hi HBase community,
>> > >
>> > > What are the current best-practices with respect to starting up an
>> HBase
>> > > cluster in EC2?  I don't see any public AMI's newer than 0.89.xxx, and
>> > > starting up that one it's, clear that it's not configured for HDFS or
>> > > clustering (empty hbase-site.xml).
>> > >
>> > > Do people generally keep data in S3 or HDFS?  If the latter, is it
>> > > persisted
>> > > via EBS?  Do the hadoop nodes have more than one EBS attached to
>> > > distinguish
>> > > HDFS from the OS?
>> > >
>> > > Any help is much appreciated.  Thanks in advance!
>> > >
>> > > -- Jim R. Wilson (jimbojw)
>> > >
>> >
>> >
>> >
>> > --
>> > --Sean
>> >
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB