Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Nosqls schema design

Copy link to this message
Re: Nosqls schema design

Re: how to think about schemas coming from a SQL / Entity-Relationship background, there's a video of a talk I gave at HBaseCon this year on that subject, here:


That's not the only way to think about it (as Mike will point out, it's even sometimes less helpful than thinking about it without that baggage), but maybe it'll help answer your questions.


On Nov 8, 2012, at 9:17 AM, Nick maillard wrote:

Thanks for the anwsers.

I'm trying to really make sense of NoSql and Hbase in particular. The software
part has a lot of loop wholes and I'm still fighting off the compaction storm
issue, so right I would not say hbase is fast when it comes to writing.

But my post was more nosql schema thoughts, after so long on SQL schemas it
does take a little time to stop thinking that way in terms of schema but also of
in terms of questions or of interaction if you'd rather.
So contrary to SQL I cannot think a logical model for data and figure out later
what I'll want out of it.

In my case I stated 10 TB but this is very likely to grow since it is the
starting scenario. I do believe having a 30 minutes latency before ingesting
logs is not an issue, however the questions to the Hbase must be anwsered in
real time manner.

I have been trying to play with my questions and see how they can fit in a
rowkey and Or columnfamilies but they being different in nature and purpose I
ended supposing they would end up in a number of different hbase tables in
order to adress the scope of questions. One table for one or three questions.
The questions have joins and filter embedded in them.

My post was about getting your insight on how you would go about answering this
type of issues, what your schemas might be. Overall how to switch from SQL
vision to noSQL vision.
Coprocessor to create a couple of tables on the fly for all questions are an
interesting way. To mapreduce the logs however I am afraid the performance would
be to slow. I was thinking of answering in milliseconds if possible. But this
might be me being new and not evaluating correctly.