Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> talk list table


+
Kireet Reddy 2013-04-15, 13:09
+
Ted Yu 2013-04-15, 17:28
+
Kireet 2013-04-15, 18:15
Copy link to this message
-
Re: talk list table
bq. write performance would be lower

The above means poorer performance.

bq. I could batch them up application side

Please do that.

bq. I guess there is no way to turn that off?

That's right.

On Mon, Apr 15, 2013 at 11:15 AM, Kireet <[EMAIL PROTECTED]> wrote:

>
>
>
> Thanks for the reply. "write performance would be lower" -> this means
> better?
>
> Also I think I used the wrong terminology regarding batching. I meant to
> ask if it uses the client side write buffer. I would think not since the
> append() method returns a Result. I could batch them up application side I
> suppose. Append also seems to return the updated value. This seems like a
> lot of unnecessary I/O in my case since I am not immediately interested in
> the updated value. I guess there is no way to turn that off?
>
>
> On 4/15/13 1:28 PM, Ted Yu wrote:
>
>> I assume you would select HBase 0.94.6.1 (the latest release) for this
>> project.
>>
>> For #1, write performance would be lower if you choose to use Append (vs.
>> using Put).
>>
>> bq. Can appends be batched by the client or do they execute immediately?
>> This depends on your use case. Take a look at the following method in
>> HTable where you can send a list of actions (Appends):
>>
>>    public void batch(final List<?extends Row> actions, final Object[]
>> results)
>> For #2
>> bq. The other would be to prefix the timestamp row key with a random
>> leading byte.
>>
>> This technique has been used elsewhere and is better than the first one.
>>
>> Cheers
>>
>> On Mon, Apr 15, 2013 at 6:09 AM, Kireet Reddy <kireet-Teh5dPVPL8nQT0dZR+*
>> *[EMAIL PROTECTED] <kireet-Teh5dPVPL8nQT0dZR%[EMAIL PROTECTED]>>
>> wrote:
>>
>>  I are planning to create a "scheduled task list" table in our hbase
>>> cluster. Essentially we will define a table with key timestamp and then
>>> the
>>> row contents will be all the tasks that need to be processed within that
>>> second (or whatever time period). I am trying to do the "reasonably wide
>>> rows" design mentioned in the hbasecon opentsdb talk. A couple of
>>> questions:
>>>
>>> 1. Should we use append or put to create tasks? Since these rows will not
>>> live forever, storage space in not a concern, read/write performance is
>>> more important. As concurrency increases I would guess the row lock may
>>> become an issue in append? Can appends be batched by the client or do
>>> they
>>> execute immediately?
>>>
>>> 2. I am a little worried about hotspots. This basic design may cause
>>> issues in terms of the table's performance. Many tasks will execute and
>>> reschedule themselves using the same interval, t + 1 hour for example. So
>>> many the writes may all go to the same block.  Also, we have a lot of
>>> other
>>> data so I am worried it may impact performance of unrelated data if the
>>> region server gets too busy servicing the task list table. I can think
>>> of 2
>>> strategies to avoid this. One would be to create N different tables and
>>> read/write tasks to them randomly. This may spread load across servers,
>>> but
>>> there is no guarantee hbase will place the tables on different region
>>> servers, correct? The other would be to prefix the timestamp row key
>>> with a
>>> random leading byte. Then when reading from the task list table,
>>> consumers
>>> could scan from any/all possible values of the random byte + current
>>> timestamp to obtain tasks. Both strategies seem like they could spread
>>> out
>>> load, but at the cost of more work/complexity to read tasks from the
>>> table.
>>> Do either of those approaches make sense?
>>>
>>> On the read side, it seems like a similar problem exists in that all
>>> consumers will be reading rows based on the current timestamp. Is this
>>> good
>>> because the block will very likely be cached or bad because the region
>>> server may become overloaded? I have a feeling the answer is going to be
>>> "it depends". :)
>>>
>>> I did see the previous posts on queues and the tips there - use zookeeper
>
+
Amit Sela 2013-04-20, 15:24
+
Otis Gospodnetic 2013-04-20, 23:10