Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - hbase table as a queue.


Copy link to this message
-
Re: hbase table as a queue.
Gary Helmling 2011-07-19, 18:27
All excellent points here in terms of tuning!  For the higher-level question
about using a table as a queue, I just wanted to add in a link to the Lily
guys' rowlog library, since it does exactly that:

http://www.lilyproject.org/lily/about/playground/hbaserowlog.html
On Tue, Jul 19, 2011 at 9:26 AM, Daniel Einspanjer
<[EMAIL PROTECTED]>wrote:

> Cool.  filed a task for us to work on that.
> https://bugzilla.mozilla.org/**show_bug.cgi?id=672527<https://bugzilla.mozilla.org/show_bug.cgi?id=672527>
>
>
> On 7/19/11 12:05 PM, Stack wrote:
>
>> Set region size very large (In trunk you can actually disable splitting).
>> St.Ack
>>
>> On Tue, Jul 19, 2011 at 8:26 AM, Daniel Einspanjer
>> <[EMAIL PROTECTED]>  wrote:
>>
>>> We use a queue table like this too and ran into the same problem.  How
>>> did
>>> you configure it such that it never splits?
>>>
>>> -Daniel
>>>
>>> On 7/16/11 4:24 PM, Stack wrote:
>>>
>>>> I learned friday that our fellas on the frontend are using an hbase
>>>> table to do simple queuing.  They insert stuff to be processed by
>>>> distributed processes and when processes are done with the work,
>>>> they'll remove the processed element from the hbase table.   They are
>>>> queuing, processing, and removing millions of items a day.  Elements
>>>> were added on the end of the queue (FIFO).
>>>>
>>>> The issue to avoid was that over time, especially if a while between
>>>> major compactions, the latency was going up.  Turns out, the table had
>>>> been splitting when the queue backed.   Then a scan for new stuff to
>>>> process had to first traverse regions that had nought in them (the key
>>>> was time-based and the tail of the table had moved on past these first
>>>> regions).  This traversal, especially if no major compaction so lots
>>>> of deletes to process, was taking time to get to the first row.
>>>>
>>>> To fix, we rid the table of its empty regions and made it so the table
>>>> would on longer split so only ever one region in it.  This should make
>>>> it so we don't end up with empty regions to skip through before we get
>>>> to the first element in the table (need the major compaction running
>>>> on a somewhat regular basis to temper latencies).  Will report back to
>>>> the list if we find otherwise.
>>>>
>>>> Do not use locks.  Doesn't scale.  Maybe update a cell when task is
>>>> taken out for processing.  If too much time elapses since last update,
>>>> maybe give it out again?
>>>>
>>>> St.Ack
>>>>
>>>> On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin<[EMAIL PROTECTED]>
>>>>  wrote:
>>>>
>>>>> Hello, we are thinking about using Hbase table as a simple queue which
>>>>> will dispatch the work for a mapreduce job, as well as real time
>>>>> fetching of data to present to end user.  In simple terms, suppose you
>>>>> had a data source table and a queue table.  The queue table has a
>>>>> smaller set of Rows that point to Values which in turn point to
>>>>> Perma-set table, which has large collection of Rows.  (so Queue{Row,
>>>>> Value} ->    Perma-Set {Row, Value}).  Or Q-Value ->    P-Row.   Our
>>>>> Goal is
>>>>> to look up which Rows to retrieve from the Perma-Set table by looking
>>>>> through the Queue.  Once the lookup into the Queue is done, the Row
>>>>> from the Queue must be deleted to avoid the same process of Perma-Set
>>>>> lookup be done twice; We expect many concurrent lookups to happen, so
>>>>> I assume the first thing we need to do is to have a client that does
>>>>> the work is acquire a lock on the Queue Row, process the work, then
>>>>> Remove the Queue Row.
>>>>>
>>>>> Has anyone done something similar before?  Any gotchas we should be
>>>>> away
>>>>> of?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> -Jack
>>>>>
>>>>>