-Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase
I guess Sesame Street isn't global... ;-) oh and of course I f'd the joke by saying Grover and not Oscar so it's my bad. :-(. [Google Oscar the groutch, and you'll understand the joke that I botched]
Its most likely GC and a mis tuned cluster.
The OP doesn't really get in to detail, except to say that his cluster is tiny. Yes, size does matter, regardless of those rumors to the contrary... 3 DN kinda small. If he's splitting that often then his region size is too small, hot spotting and other things can impact performance however not in the way he described.
Also when you look at performance, look at reads, not writes. You can cache both and writes are less important than reads. (think about it.)
Since this type conversation keeps popping up, it would be a good topic for Strata in NY. (Not too subtle of a hint to those who are picking topics...) Good cluster design is important, more important than people think.
Sent from a remote device. Please excuse any typos...
On Apr 25, 2012, at 12:08 AM, Mikael Sitruk <[EMAIL PROTECTED]> wrote:
> 1. writes are not blocked during compaction
> 2. compaction cannot have a constant time since the files/regions are
> getting bigger
> 3. beside the GC pauses (which seems to be the best candidate here) on
> either the client or RS (what are your setting BTW, and data size per
> insert), did you presplit your regions or a split is occurring during the
> 4. did you look at the logs? is there any operation that is taking too long
> there (in 0.92 you can configure and print any operation that will take
> long time)
> On Wed, Apr 25, 2012 at 4:58 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
>> Have you thought about Garbage Collection?
>> Sent from my iPhone
>> On Apr 24, 2012, at 12:41 PM, "Skchaudhary" <[EMAIL PROTECTED]> wrote:
>>> I have a cluster Hbase set-up. In that I have 3 Region Servers. There is
>>> table which has 27 Regions equally distributed among 3 Region servers--9
>>> regions per region server.
>>> Region server 1 has ---region 1-9 Region server 2 has ---region 10-18
>>> server 3 has ---region 19-27
>>> Now when I start a program which inserts rows in region 1 and region 5
>>> under Region Server-1) alternatively and on continuous basis, I see that
>>> insert time for each row is not constant or consistent---there is a lot
>>> variance or say standard deviation of insert time is quite large. Some
>>> it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and
>>> sometimes even > 3000 ms.Even though data size in rows is equal.
>>> I understand that due to flushing and compaction of Regions the writes
>>> blocked---but then it should not be blocked for larger span of time and
>>> blockage time should be consistent for every flush/compaction (minor
>>> All in all every time flush and compaction occurs it should take nearly
>>> time for each compaction and flush.
>>> For our application we need a consistent quality of service and if not
>>> perfect atleast we need a well visible boundary lines--like for each row
>>> insert it will take some 0 to 10 ms and not more than 10 ms(just an
>>> that even though minor compaction or flush occurs.
>>> Is there any setting/configuration which I should try?
>>> Any ideas of how to achieve it in Hbase.
>>> Any help would be really appreciated.
>>> Thanks in advance!!
>>> View this message in context:
>>> Sent from the HBase User mailing list archive at Nabble.com.