M/R + timeRange filter will be unnecessarily slow and heavy on the cluster. If you can, lead the row key with the time, that way you can very quickly find any changes within an interval.

(but need watch region hotspotting then, might be need to prefix the row with a few bit from a hash of row key)

Also check out the OpenTSDB schema if you're storing time series type data.

________________________________
 From: Wei Tan <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Wednesday, January 29, 2014 2:40 PM
Subject: Re: larger HFile block size for very wide row?
 

Hi Ted and Vladimir, thanks!

I was thinking if using index is a good idea. My scan/get criteria is
something like "get all rows I inserted since end of yesterday". I may
have to use MapReduce + timeRange filter.

Lars and all, I will try to report back some performance data later.
Thanks for the help from you all.

Best regards,
Wei

Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan

From:   Ted Yu <[EMAIL PROTECTED]>
To:     "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>,
Date:   01/29/2014 04:37 PM
Subject:        Re: larger HFile block size for very wide row?
bq. table:family2 holds only row keys (no data) from  table:family1.

Wei:
You can designate family2 as essential column family so that family1 is
brought into heap when needed.
On Wed, Jan 29, 2014 at 1:33 PM, Vladimir Rodionov
<[EMAIL PROTECTED]>wrote:

default
a
and
typically
be
If
form,
please
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB