Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Schema design question - Hot Key concerns

Copy link to this message
Re: Schema design question - Hot Key concerns
Suraj Varma 2011-11-18, 22:57
Thanks for your response.

As always ... it is currently on an RDBMS having scalability concerns
... and hence a nosql is being evaluated for this. :)

In the quora link posted, Todd mentions that "if the updates are in
the 100 updates/sec range on the hot row" it may still be ok. I guess,
that was one of the questions I had as in how hot is "hot"? If an app
expects 100 updates/sec to a row ... perhaps that is not considered
hot with HBase?

The only reason I'm even considering this is because the updates are
to MemStore ... so, in my mind this is basically concurrently updating
a SortedMap of SortedMaps etc in memory ... so, how drastic is the
locking is there really?

Also - given the sparse column updates (i.e. mutually exclusive), I
was wondering if that reduces locking in some way.

Michael ... on your #2 ... why would rows be locked when people are
querying? Wouldn't they be querying by a specific timestamp ... so
they would basically see a row as of that timestamp?

On #1, yes - I think that's the complexity moving this into app.

One other design thought we had was to have it as a tall table and
have a "BOOKED_QTY" column that we do live rollups to get a
SEATS_BOOKED which can then be used in lieu of SEATS_AVAILABLE (if we
know the TOTAL ... that doesn't change for a SHOW_ID)

With columnar db, column level rollups are _supposed_ to be fast ...
but again ... question is "how fast". Is there a general benchmark on
this? (i.e. similar to the 100 updates/sec sort of rough ball park

Are metrics collected by OpenTSDB like applications generally
aggregated via MR jobs? Or are there live rollups happening when user
specifies some criteria? If it is live, what sort of rollup times (say
n ms per million rows, sort of ball park ...) can one expect?

A more general question: I've been reading about Increments and how
there are some optimizations to allow Increments to be 'batched' up
... is that a 0.92 feature? In this case ... for instance, if I were
to keep my TOTAL_AVAILABLE separately and fire increments /decrements
against that row ... would it alleviate the hot row scenario.

Just trying to see what possibilities exist to solve the constraints.
On Fri, Nov 18, 2011 at 11:04 AM, Michael Segel
> Not sure if you'd consider this a 'big data' problem.
> First, IMHO you're better off serving this out of a relational model.
> Having said that....
> 'Hot Row' as in reads isn't a bad thing since its in cache.
> 'Hot Row' as in updates... not really a good thing since you have to lock the row to update it. So your updates going to the same row will kill performance.
> (Sorry but this really sounds like a school level homework question...)
> In terms of design in HBase...
> Since you've already stated that the row will get hit multiple times for the same seat. (reservations and cancellations)
> You have a couple of problems...
> 1) Lack of ACID control. This makes it harder to design a reservation system.
> 2) You will have inventory problems due to rows being locked while people are querying the system to purchase a seat.
> With respect to the schema, I would suggest that you rethink it.
> Just an example... The Rolling Stones selling out Cleveland Stadium for a Rock-N-Roll Hall of Fame concert. You have 100,000 seats in the stadium plus luxury boxes, then add field seats... read a lot of people.
> As its already been pointed out... this row becomes very hot while tickets go on sale. It will become a bottleneck.
> Also assuming your SHOW_ID, is really a composite of (venue, show, date), you will want to further fragment your rows.
> Also you're probably going to want to split your data in to two different tables and then write some ACID compliance at your APP level.
> Just a quick thought before I pop out for lunch...
>> Date: Fri, 18 Nov 2011 10:02:54 -0800
>> Subject: Re: Schema design question - Hot Key concerns