Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hbase table as a queue.


Copy link to this message
-
Re: hbase table as a queue.
All excellent points here in terms of tuning!  For the higher-level question
about using a table as a queue, I just wanted to add in a link to the Lily
guys' rowlog library, since it does exactly that:

http://www.lilyproject.org/lily/about/playground/hbaserowlog.html
On Tue, Jul 19, 2011 at 9:26 AM, Daniel Einspanjer
<[EMAIL PROTECTED]>wrote:

> Cool.  filed a task for us to work on that.
> https://bugzilla.mozilla.org/**show_bug.cgi?id=672527<https://bugzilla.mozilla.org/show_bug.cgi?id=672527>
>
>
> On 7/19/11 12:05 PM, Stack wrote:
>
>> Set region size very large (In trunk you can actually disable splitting).
>> St.Ack
>>
>> On Tue, Jul 19, 2011 at 8:26 AM, Daniel Einspanjer
>> <[EMAIL PROTECTED]>  wrote:
>>
>>> We use a queue table like this too and ran into the same problem.  How
>>> did
>>> you configure it such that it never splits?
>>>
>>> -Daniel
>>>
>>> On 7/16/11 4:24 PM, Stack wrote:
>>>
>>>> I learned friday that our fellas on the frontend are using an hbase
>>>> table to do simple queuing.  They insert stuff to be processed by
>>>> distributed processes and when processes are done with the work,
>>>> they'll remove the processed element from the hbase table.   They are
>>>> queuing, processing, and removing millions of items a day.  Elements
>>>> were added on the end of the queue (FIFO).
>>>>
>>>> The issue to avoid was that over time, especially if a while between
>>>> major compactions, the latency was going up.  Turns out, the table had
>>>> been splitting when the queue backed.   Then a scan for new stuff to
>>>> process had to first traverse regions that had nought in them (the key
>>>> was time-based and the tail of the table had moved on past these first
>>>> regions).  This traversal, especially if no major compaction so lots
>>>> of deletes to process, was taking time to get to the first row.
>>>>
>>>> To fix, we rid the table of its empty regions and made it so the table
>>>> would on longer split so only ever one region in it.  This should make
>>>> it so we don't end up with empty regions to skip through before we get
>>>> to the first element in the table (need the major compaction running
>>>> on a somewhat regular basis to temper latencies).  Will report back to
>>>> the list if we find otherwise.
>>>>
>>>> Do not use locks.  Doesn't scale.  Maybe update a cell when task is
>>>> taken out for processing.  If too much time elapses since last update,
>>>> maybe give it out again?
>>>>
>>>> St.Ack
>>>>
>>>> On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin<[EMAIL PROTECTED]>
>>>>  wrote:
>>>>
>>>>> Hello, we are thinking about using Hbase table as a simple queue which
>>>>> will dispatch the work for a mapreduce job, as well as real time
>>>>> fetching of data to present to end user.  In simple terms, suppose you
>>>>> had a data source table and a queue table.  The queue table has a
>>>>> smaller set of Rows that point to Values which in turn point to
>>>>> Perma-set table, which has large collection of Rows.  (so Queue{Row,
>>>>> Value} ->    Perma-Set {Row, Value}).  Or Q-Value ->    P-Row.   Our
>>>>> Goal is
>>>>> to look up which Rows to retrieve from the Perma-Set table by looking
>>>>> through the Queue.  Once the lookup into the Queue is done, the Row
>>>>> from the Queue must be deleted to avoid the same process of Perma-Set
>>>>> lookup be done twice; We expect many concurrent lookups to happen, so
>>>>> I assume the first thing we need to do is to have a client that does
>>>>> the work is acquire a lock on the Queue Row, process the work, then
>>>>> Remove the Queue Row.
>>>>>
>>>>> Has anyone done something similar before?  Any gotchas we should be
>>>>> away
>>>>> of?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> -Jack
>>>>>
>>>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB