Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hadoop/HBase hardware requirement


Copy link to this message
-
Re: Hadoop/HBase hardware requirement
Hi Lars,
I agree with every sentence you wrote (and that's why we chose HBase).
However, from a managerial point-of-view the question of the initial
investment is very important (specially when considering a new technology).

Lior
p.s. The price is in USD ....

On Mon, Nov 22, 2010 at 2:43 PM, Lars George <[EMAIL PROTECTED]> wrote:

> Hi Lior,
>
> I can only hope you state this in Schekel! But 20 nodes with Hadoop
> can do quite a lot and you cannot compare a single Oracle box with a
> 20 node Hadoop cluster as they serve slightly different use-cases. You
> need to make a commitment to what you want to achieve with HBase and
> that growth is the most important factor. Scaling Oracle is really
> expensive while HBase/Hadoop is not in comparison and costs are
> linear, while with Oracle more exponential.
>
> Lars
>
> On Mon, Nov 22, 2010 at 1:27 PM, Lior Schachter <[EMAIL PROTECTED]>
> wrote:
> > Hi all, Thanks for your input and assistance.
> >
> >
> > From your answers I understand that:
> > 1. more is better but our configuration might work.
> > 2. there are small tweaks we can do that will improve our configuration
> > (like having 4x500GB disks).
> > 3. use monitoring (like Ganglia) to find the bottlenecks.
> >
> > For me, The question here is how to balance between our current budget
> and
> > system stability (and performance).
> > I agree that more memory and more disk space will improve our
> responsiveness
> > but on the other hand our system is NOT expected to be real-time (but
> rather
> > a back office analytics with few hours delay).
> >
> > This is a crucial point since the proposed configurations we found in the
> > web don't distinguish between real-time configurations and back-office
> > configurations. To build a real-time cluster with 20 nodes will cost
> around
> > 200-300K (in Israel) this is similar to the price of a quite strong
> Oracle
> > cluster... so my boss (the CTO) was partially right when telling me - but
> > you said it would be cheap !! very cheap :)
> >
> > I believe that more money will come when we show the viability of the
> > system... I also read that heterogeneous clusters are common.
> >
> > It will help a lot if you can provide your configurations and system
> > characteristics (maybe in a Wiki page).
> > It will also help to get more of the "small tweaks" that you found
> helpful.
> >
> >
> > Lior Schachter
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Nov 22, 2010 at 1:33 PM, Lars George <[EMAIL PROTECTED]>
> wrote:
> >
> >> Oleg,
> >>
> >> Do you have Ganglia or some other graphing tool running against the
> >> cluster? It gives you metrics that are crucial here, for example the
> >> load on Hadoop and its DataNodes as well as insertion rates etc. on
> >> HBase. What is also interesting is the compaction queue to see if the
> >> cluster is going slow.
> >>
> >> Did you try loading from an empty system to a loaded one? Or was it
> >> already filled and you are trying to add more? Are you spreading the
> >> load across servers or are you using sequential keys that tax only one
> >> server at a time?
> >>
> >> 16GB should work, but is not ideal. The various daemons simply need
> >> room to breathe. But that said, I have personally started with 12GB
> >> even and it worked.
> >>
> >> Lars
> >>
> >> On Mon, Nov 22, 2010 at 12:17 PM, Oleg Ruchovets <[EMAIL PROTECTED]>
> >> wrote:
> >> > On Sun, Nov 21, 2010 at 10:39 PM, Krishna Sankar <[EMAIL PROTECTED]
> >> >wrote:
> >> >
> >> >> Oleg & Lior,
> >> >>
> >> >> Couple of questions & couple of suggestions to ponder:
> >> >> A)  When you say 20 Name Servers, I assume you are talking about 20
> Task
> >> >> Servers
> >> >>
> >> >
> >> > Yes
> >> >
> >> >
> >> >> B)  What type are your M/R jobs ? Compute Intensive vs. storage
> >> intensive ?
> >> >>
> >> >
> >> > M/R -- most of it -- it is a parsing stuff , result of m/r  5% - 10%
> >> stores
> >> > to hbase
> >> >
> >> >
> >> >> C)  What is your Data growth ?
> >> >>
> >> >
> >> >  currently we have 50GB per day , it could be ~150GB.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB