Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Nosqls schema design


+
Nick maillard 2012-11-08, 09:00
+
Michael Segel 2012-11-08, 14:55
+
Ian Varley 2012-11-08, 13:46
+
Nick maillard 2012-11-08, 15:17
Copy link to this message
-
Re: Nosqls schema design
Hi Nick:

Your question is a good and tough one. I haven't find anything that helps
in guiding the schema design in the nosql world. There are general concepts
but none of them is closed to the SQL schema design in which you can apply
some rules to guiding your decision.

The best presentation I have found about the general concepts in hbase
schema design is
http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbasecon-2012.html
and
search for Schema Design. From this presentation, you can learn why it is
so difficult to come up with a suggestion for your problem and learn some
best practices to start your own design.

HTH,

Jerry
On Thu, Nov 8, 2012 at 10:17 AM, Nick maillard <
[EMAIL PROTECTED]> wrote:

> Thanks for the anwsers.
>
> I'm trying to really make sense of NoSql and Hbase in particular. The
> software
> part has a lot of loop wholes and I'm still fighting off the compaction
> storm
> issue, so right I would not say hbase is fast when it comes to writing.
>
> But my post was more nosql schema thoughts, after so long on SQL schemas it
> does take a little time to stop thinking that way in terms of schema but
> also of
> in terms of questions or of interaction if you'd rather.
> So contrary to SQL I cannot think a logical model for data and figure out
> later
> what I'll want out of it.
>
> In my case I stated 10 TB but this is very likely to grow since it is the
> starting scenario. I do believe having a 30 minutes latency before
> ingesting
> logs is not an issue, however the questions to the Hbase must be anwsered
> in
> real time manner.
>
> I have been trying to play with my questions and see how they can fit in a
> rowkey and Or columnfamilies but they being different in nature and
> purpose I
> ended supposing they would end up in a number of different hbase tables in
> order to adress the scope of questions. One table for one or three
> questions.
> The questions have joins and filter embedded in them.
>
> My post was about getting your insight on how you would go about answering
> this
> type of issues, what your schemas might be. Overall how to switch from SQL
> vision to noSQL vision.
> Coprocessor to create a couple of tables on the fly for all questions are
> an
> interesting way. To mapreduce the logs however I am afraid the performance
> would
> be to slow. I was thinking of answering in milliseconds if possible. But
> this
> might be me being new and not evaluating correctly.
>
>
>
>
>
+
Pamecha, Abhishek 2012-11-08, 19:09
+
Ian Varley 2012-11-08, 15:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB