-Re: xceiver count, regionserver shutdown
Bryan Keller 2012-02-06, 23:18
Yes, insert pattern is random, and yes, the compactions are going through the roof. Thanks for pointing me in that direction. I am going to try increasing the region max filesize to 4gb (it was set to 512mb) and the memstore flush size to 512mb (it was 128mb). I'm also going to increase the heap to 16gb (right now it is 4gb).
On Feb 6, 2012, at 1:33 PM, Jean-Daniel Cryans wrote:
> Ok this helps, we're still missing your insert pattern regarding but I
> bet it's pretty random considering what's happening to your cluster.
> I'm guessing you didn't set up metrics else you would have told us
> that the compaction queues are through the roof during the import, but
> at this point I'm pretty sure it's the case.
> To solve this your choices are:
> - Do bulk uploads instead of brute forcing it so that you would be
> entirely skipping those issues. See
> - Get that number of regions down to something more manageable; you
> didn't say how much memory you gave to HBase so I can't say how many
> exactly you need but it's usually never more than 20. Then set the
> memstore flush size and max file size accordingly. The goal here is to
> flush/compact as less as possible.
> - Keep your current setup, but slow down the insert rate so that data
> can be compacted over and over again without overrunning your region
> - Use a more sequential pattern so that you hit only a few regions at
> a time, this is like the second solution but trying to make it work
> with your current setup. This might not be practical for you as it
> really depends on how easily you can sort your data source.
> Let us know if you need more help,
> On Mon, Feb 6, 2012 at 1:12 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>> This is happening during heavy update. I have a "wide" table with around 4 million rows that have already been inserted. I am adding billions of columns to the rows. Each row can have 20+k columns.
>> I perform the updates in batch, i.e. I am using the HTable.put(List<Put>) API. The batch size is 1000 Puts. The columns being added are scattered, e.g. I may add 20 columns to 1000 different rows in each batch. Then in the next batch add 20 columns to 1000 more rows (which may be the same rows or different than the previous batch), and so forth.
>> BTW, I tried upping the "xcievers" parameter to 8192 but now I'm getting a "Too many open files" error. I have the file limit set to 32k.
>> On Feb 6, 2012, at 11:59 AM, Jean-Daniel Cryans wrote:
>>> The number of regions is the first thing to check, then it's about the
>>> actual number of blocks opened. Is the issue happening during a heavy
>>> insert? In this case I guess you could end up with hundreds of opened
>>> files if the compactions are piling up. Setting a bigger memstore
>>> flush size would definitely help... but then again if your insert
>>> pattern is random enough all 200 regions will have filled memstores so
>>> you'd end up with hundreds of super small files...
>>> Please tell us more about the context of when this issue happens.
>>> On Mon, Feb 6, 2012 at 11:42 AM, Bryan Keller <[EMAIL PROTECTED]> wrote:
>>>> I am trying to resolve an issue with my cluster when I am loading a bunch of data into HBase. I am reaching the "xciever" limit on the data nodes. Currently I have this set to 4096. The data node is logging "xceiverCount 4097 exceeds the limit of concurrent xcievers 4096". The regionservers eventually shut down. I have read the various threads on this issue.
>>>> I have 4 datanodes/regionservers. Each regionserver has only around 200 regions. The table has 2 column families. I have the region file size set to 500mb, and I'm using Snappy compression. This problem is occurring on HBase 0.90.4 and Hadoop 0.20.2 (both Cloudera cdh3u3).
>>>> From what I have read, the number of regions on a node can cause the xceiver limit to be reached, but it doesn't seem like I have an excessive number of regions. I want the table to scale higher, so simply upping the xceiver limit could perhaps get my table functional for now, but it seems it will only be a temporary fix.