Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> talk list table


+
Kireet Reddy 2013-04-15, 13:09
+
Ted Yu 2013-04-15, 17:28
+
Kireet 2013-04-15, 18:15
+
Ted Yu 2013-04-15, 20:18
+
Amit Sela 2013-04-20, 15:24
Copy link to this message
-
Re: talk list table
+ http://blog.sematext.com/2012/12/24/hbasewd-and-hbasehut-handy-hbase-libraries-available-in-public-maven-repo/
if you use Maven and want to use HBaseWD.

Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html
On Sat, Apr 20, 2013 at 11:24 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
> Hope I'm not too late here... regarding hot spotting with sequential keys,
> I'd suggest you read this Sematext blog -
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
> They present a nice idea there for this kind of issues.
>
> Good Luck!
>
>
>
> On Mon, Apr 15, 2013 at 11:18 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> bq. write performance would be lower
>>
>> The above means poorer performance.
>>
>> bq. I could batch them up application side
>>
>> Please do that.
>>
>> bq. I guess there is no way to turn that off?
>>
>> That's right.
>>
>> On Mon, Apr 15, 2013 at 11:15 AM, Kireet <[EMAIL PROTECTED]> wrote:
>>
>> >
>> >
>> >
>> > Thanks for the reply. "write performance would be lower" -> this means
>> > better?
>> >
>> > Also I think I used the wrong terminology regarding batching. I meant to
>> > ask if it uses the client side write buffer. I would think not since the
>> > append() method returns a Result. I could batch them up application side
>> I
>> > suppose. Append also seems to return the updated value. This seems like a
>> > lot of unnecessary I/O in my case since I am not immediately interested
>> in
>> > the updated value. I guess there is no way to turn that off?
>> >
>> >
>> > On 4/15/13 1:28 PM, Ted Yu wrote:
>> >
>> >> I assume you would select HBase 0.94.6.1 (the latest release) for this
>> >> project.
>> >>
>> >> For #1, write performance would be lower if you choose to use Append
>> (vs.
>> >> using Put).
>> >>
>> >> bq. Can appends be batched by the client or do they execute immediately?
>> >> This depends on your use case. Take a look at the following method in
>> >> HTable where you can send a list of actions (Appends):
>> >>
>> >>    public void batch(final List<?extends Row> actions, final Object[]
>> >> results)
>> >> For #2
>> >> bq. The other would be to prefix the timestamp row key with a random
>> >> leading byte.
>> >>
>> >> This technique has been used elsewhere and is better than the first one.
>> >>
>> >> Cheers
>> >>
>> >> On Mon, Apr 15, 2013 at 6:09 AM, Kireet Reddy
>> <kireet-Teh5dPVPL8nQT0dZR+*
>> >> *[EMAIL PROTECTED] <
>> kireet-Teh5dPVPL8nQT0dZR%[EMAIL PROTECTED]>>
>> >> wrote:
>> >>
>> >>  I are planning to create a "scheduled task list" table in our hbase
>> >>> cluster. Essentially we will define a table with key timestamp and then
>> >>> the
>> >>> row contents will be all the tasks that need to be processed within
>> that
>> >>> second (or whatever time period). I am trying to do the "reasonably
>> wide
>> >>> rows" design mentioned in the hbasecon opentsdb talk. A couple of
>> >>> questions:
>> >>>
>> >>> 1. Should we use append or put to create tasks? Since these rows will
>> not
>> >>> live forever, storage space in not a concern, read/write performance is
>> >>> more important. As concurrency increases I would guess the row lock may
>> >>> become an issue in append? Can appends be batched by the client or do
>> >>> they
>> >>> execute immediately?
>> >>>
>> >>> 2. I am a little worried about hotspots. This basic design may cause
>> >>> issues in terms of the table's performance. Many tasks will execute and
>> >>> reschedule themselves using the same interval, t + 1 hour for example.
>> So
>> >>> many the writes may all go to the same block.  Also, we have a lot of
>> >>> other
>> >>> data so I am worried it may impact performance of unrelated data if the
>> >>> region server gets too busy servicing the task list table. I can think
>> >>> of 2
>> >>> strategies to avoid this. One would be to create N different tables and
>> >>> read/write tasks to them randomly. This may spread load across servers,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB