Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Nosqls schema design


Copy link to this message
-
Re: Nosqls schema design
Ok...

First, if you're estimating that the raw data would be 10TB, you will find out that you will need a bit more to handle the data in terms of indexing and denormalized structures.  

The short answer to your question is yes, you can do it.

Longer answer...

You can bake a solution in both a relational and HBase/NoSQL solution, however, you will be close to hitting the ceiling on RDBMS and you will be spending a fortune on licensing and hardware.

If you want to do this in terms of HBase, you can.

Most of the queries are straight forward, however you will be duplicating data.

The interesting query:
> - All users that have commented a page W and liked a page P.

This will require a map/reduce job to produce an answer.  Well maybe not if you're using secondary indexing techniques. Then it would be an intersection of two result sets to give you the final set of users.

HTH
On Nov 8, 2012, at 3:00 AM, Nick maillard <[EMAIL PROTECTED]> wrote:

> Hi everyone
>
> I'm currently testing Hbase/Hadoop in terms of performance but also in terms off
> applicability. After some tries, and reads I'm wondering If Hbase is well fitted
> for the current need I'm testing.
>
> Say I had logs on websites listing users going to webpage, reading an article,
> liking a piece of data, commenting or even bookmarking.
> I would store these logs on a long period and for a lot of different websites
> and I would like to use the data with these questions:
> - All users that have been to the webpage X in the last Ndays
> - All users that have liked and then bookmarked a page in a range of Y days.
> - All the pages that are commented X times in the last N days.
> - All users that have commented a page W and liked a page P.
> - All pages seen,liked or commented by a given user.
>
> As you see this might a very SQL way of thinking. The way I understand the
> questions being different in nature I would have different tables to answer them.
> Am I correct? How could this be represented and would sql be a better fit?
> The data would be large around a 10 Tbytes.
>
> regards
>
>