Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Finding the latest updated rows


Copy link to this message
-
Re: Finding the latest updated rows
Michael Segel 2014-01-21, 12:14
Using the timestamp to find the last updated row is going to cause problems...

1) It will have to be the first portion of your composite row-key, otherwise you still end up performing a full table scan.
2) Hotspotting will occur

3) Does your row key change if you insert columns to an existing row?

A better cheat would be to create a metadata table that had a row for each table and when you inserted in to the base table, you updated the audit table.
It would be very small table and because the coprocessor model is flawed... you could run in to issues of deadlocking when you attempt to maintain this table.

Or you may want to consider using zookeeper and then flush it to a table or something.

On Jan 21, 2014, at 1:55 AM, Joshi, Rekha <[EMAIL PROTECTED]> wrote:

> Hi Wiliam,
>
> The timestamp part of rowkey schema design caters to this., usually
> efficient but your SLA may differ.
>
> http://hbase.apache.org/book.html#reverse.timestamp
>
> http://hbase.apache.org/book.html#schema.casestudies
>
> http://hbase.apache.org/book.html#timeseries
>
>
> Thanks
> Rekha
>
> On 21/01/14 9:36 AM, "William Kang" <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>> In HBase, the time stamp is set for each column, not for the entire row.
>> If
>> somehow I want to find the latest updated (put new row, or update only
>> certain columns in some rows, etc) rows, is there an efficient way to do
>> it?
>>
>> Many thanks.
>>
>>
>> William
>
>

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com