Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hadoop/HBase hardware requirement


Copy link to this message
-
Re: Hadoop/HBase hardware requirement
Hi Lars,
I agree with every sentence you wrote (and that's why we chose HBase).
However, from a managerial point-of-view the question of the initial
investment is very important (specially when considering a new technology).

Lior
p.s. The price is in USD ....

On Mon, Nov 22, 2010 at 2:43 PM, Lars George <[EMAIL PROTECTED]> wrote:

> Hi Lior,
>
> I can only hope you state this in Schekel! But 20 nodes with Hadoop
> can do quite a lot and you cannot compare a single Oracle box with a
> 20 node Hadoop cluster as they serve slightly different use-cases. You
> need to make a commitment to what you want to achieve with HBase and
> that growth is the most important factor. Scaling Oracle is really
> expensive while HBase/Hadoop is not in comparison and costs are
> linear, while with Oracle more exponential.
>
> Lars
>
> On Mon, Nov 22, 2010 at 1:27 PM, Lior Schachter <[EMAIL PROTECTED]>
> wrote:
> > Hi all, Thanks for your input and assistance.
> >
> >
> > From your answers I understand that:
> > 1. more is better but our configuration might work.
> > 2. there are small tweaks we can do that will improve our configuration
> > (like having 4x500GB disks).
> > 3. use monitoring (like Ganglia) to find the bottlenecks.
> >
> > For me, The question here is how to balance between our current budget
> and
> > system stability (and performance).
> > I agree that more memory and more disk space will improve our
> responsiveness
> > but on the other hand our system is NOT expected to be real-time (but
> rather
> > a back office analytics with few hours delay).
> >
> > This is a crucial point since the proposed configurations we found in the
> > web don't distinguish between real-time configurations and back-office
> > configurations. To build a real-time cluster with 20 nodes will cost
> around
> > 200-300K (in Israel) this is similar to the price of a quite strong
> Oracle
> > cluster... so my boss (the CTO) was partially right when telling me - but
> > you said it would be cheap !! very cheap :)
> >
> > I believe that more money will come when we show the viability of the
> > system... I also read that heterogeneous clusters are common.
> >
> > It will help a lot if you can provide your configurations and system
> > characteristics (maybe in a Wiki page).
> > It will also help to get more of the "small tweaks" that you found
> helpful.
> >
> >
> > Lior Schachter
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Nov 22, 2010 at 1:33 PM, Lars George <[EMAIL PROTECTED]>
> wrote:
> >
> >> Oleg,
> >>
> >> Do you have Ganglia or some other graphing tool running against the
> >> cluster? It gives you metrics that are crucial here, for example the
> >> load on Hadoop and its DataNodes as well as insertion rates etc. on
> >> HBase. What is also interesting is the compaction queue to see if the
> >> cluster is going slow.
> >>
> >> Did you try loading from an empty system to a loaded one? Or was it
> >> already filled and you are trying to add more? Are you spreading the
> >> load across servers or are you using sequential keys that tax only one
> >> server at a time?
> >>
> >> 16GB should work, but is not ideal. The various daemons simply need
> >> room to breathe. But that said, I have personally started with 12GB
> >> even and it worked.
> >>
> >> Lars
> >>
> >> On Mon, Nov 22, 2010 at 12:17 PM, Oleg Ruchovets <[EMAIL PROTECTED]>
> >> wrote:
> >> > On Sun, Nov 21, 2010 at 10:39 PM, Krishna Sankar <[EMAIL PROTECTED]
> >> >wrote:
> >> >
> >> >> Oleg & Lior,
> >> >>
> >> >> Couple of questions & couple of suggestions to ponder:
> >> >> A)  When you say 20 Name Servers, I assume you are talking about 20
> Task
> >> >> Servers
> >> >>
> >> >
> >> > Yes
> >> >
> >> >
> >> >> B)  What type are your M/R jobs ? Compute Intensive vs. storage
> >> intensive ?
> >> >>
> >> >
> >> > M/R -- most of it -- it is a parsing stuff , result of m/r  5% - 10%
> >> stores
> >> > to hbase
> >> >
> >> >
> >> >> C)  What is your Data growth ?
> >> >>
> >> >
> >> >  currently we have 50GB per day , it could be ~150GB.