Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Schema design question - Hot Key concerns

Suraj Varma 2011-11-18, 17:33
Sam Seigal 2011-11-18, 18:02
Michael Segel 2011-11-18, 19:04
Suraj Varma 2011-11-18, 22:57
Copy link to this message
Re: Schema design question - Hot Key concerns
Michel Segel 2011-11-20, 09:52

First a caveat... I haven't seen your initial normalized schema, so take what I say with a grain of salt...

The problem you are trying to solve is one which can be solved better on an RDBMS platform and does not fit well in a NoSQL space.
Your scalability issue would probably be better solved with a redesign of your existing schema. Over twenty years ago Hyatt hotels moved their reservation system from the main frame to an unix box.  Back then those unix boxes were huge but thanks to Moore's law the system could probably fit on to a single box we use for a datanode.

The point I'm trying to make is that your problem is very similar to the same sort of problems you would face when writing a hotel reservation system.

Your problem is an OLTP problem which requires things like ACID compliance, transactions and different isolation levels...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 18, 2011, at 4:57 PM, Suraj Varma <[EMAIL PROTECTED]> wrote:

> Thanks for your response.
> As always ... it is currently on an RDBMS having scalability concerns
> .... and hence a nosql is being evaluated for this. :)
> In the quora link posted, Todd mentions that "if the updates are in
> the 100 updates/sec range on the hot row" it may still be ok. I guess,
> that was one of the questions I had as in how hot is "hot"? If an app
> expects 100 updates/sec to a row ... perhaps that is not considered
> hot with HBase?
> The only reason I'm even considering this is because the updates are
> to MemStore ... so, in my mind this is basically concurrently updating
> a SortedMap of SortedMaps etc in memory ... so, how drastic is the
> locking is there really?
> Also - given the sparse column updates (i.e. mutually exclusive), I
> was wondering if that reduces locking in some way.
> Michael ... on your #2 ... why would rows be locked when people are
> querying? Wouldn't they be querying by a specific timestamp ... so
> they would basically see a row as of that timestamp?
> On #1, yes - I think that's the complexity moving this into app.
> One other design thought we had was to have it as a tall table and
> have a "BOOKED_QTY" column that we do live rollups to get a
> SEATS_BOOKED which can then be used in lieu of SEATS_AVAILABLE (if we
> know the TOTAL ... that doesn't change for a SHOW_ID)
> With columnar db, column level rollups are _supposed_ to be fast ...
> but again ... question is "how fast". Is there a general benchmark on
> this? (i.e. similar to the 100 updates/sec sort of rough ball park
> figure).
> Are metrics collected by OpenTSDB like applications generally
> aggregated via MR jobs? Or are there live rollups happening when user
> specifies some criteria? If it is live, what sort of rollup times (say
> n ms per million rows, sort of ball park ...) can one expect?