-Re: Hbase for real-time data aggregation
shashwat shriparv 2012-01-06, 19:23
As far as my exp it not bad to go wid hbase. only proble is you will not
get redimade things. if your going wid it you can look for indexing option
available wid hbase. you cn try hsearch and lily project for indexing and
On Fri, Jan 6, 2012 at 11:25 PM, prasenjit mukherjee
> I need to design a near real-time system where documents ( with
> fields:id,keywords,timestamp ) are getting added to the system. The
> requirement is to get top-k keywords from the documents added to the
> system in last x minutes. The typical document addition rate is around
> 100 documents/sec, which may increase in the future ( hence technology
> should be horizontally scalable ).
> I am thinking of using hbase. For each document we can add a set of
> keys ( for all the keywords in that doc ) with timestamp_keywords.
> During query time we can run a map-reduce job over a keyrange ( from
> ts1_* to ts2* ) to compute the the keyword frequency for that range.
> Any other better technologies for this use-case ? Like MomgoDB,
> Cassandra, Storm etc. The use case is primarily on aggregation.
width="728" height="90" scrolling="no" border="0" marginwidth="0"