Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Extremely long flush times


Copy link to this message
-
Re: Extremely long flush times
Can you jstack while this happens? Anything weird in the datanode log?
Is the load high during that time?

Thx,

J-D

On Fri, Aug 10, 2012 at 3:12 PM, Espinoza,Carlos <[EMAIL PROTECTED]> wrote:
> We were able to replicate this behavior in a pseudo-distributed hbase
> (hbase-0.94.1) environment. We wrote a test program that creates a test
> table "MyTestTable" and populates it with random rows, then it creates a
> row with 60,000 columns and repeatedly updates it. Each column has a 18
> byte qualifier and a 50 byte value. In our tests, when we ran the
> program, we usually never got beyond 15 updates before it would flush
> for a really long time. The rows that are being updated are about 4MB
> each (minues any hbase metadata).
>
> It doesn't seem like it's caused by GC. I turned on gc logging, and
> didn't see any long pauses. This is the gc log during the flush.
> http://pastebin.com/vJKKXDx5
>
> This is the regionserver log with debug on during the same flush
> http://pastebin.com/Fh5213mg
>
> This is the test program we wrote.
> http://pastebin.com/aZ0k5tx2
>
> You should be able to just compile it, and run it against a running
> HBase cluster.
> $ java TestTable
>
> Carlos
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> Jean-Daniel Cryans
> Sent: Monday, August 06, 2012 8:04 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Extremely long flush times
>
> Hi Ron,
>
> I think this is caused by GC, for example there's 7 minutes between the
> last blocking message and the end of the flush:
>
> 2012-08-06 11:44:36,348 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC
> Server handler 42 on 9009' on region
> Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6.:
> memstore size 271.9m is >= than blocking 256.0m size
> 2012-08-06 11:51:28,139 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of
> ~155.4m/162905016, currentsize=116.5m/122172880 for region
> Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in
> 501337ms, sequenceid=1792264092, compaction requested=true
>
> During that time I would expect that clients got socket timeouts, tried
> to write again, and data would just pile up in the RPC queue.
>
> Do you have the gc log handy? Also did you set the zookeeper session
> timeout higher than the default of 3 minutes?
>
> Thx,
>
> J-D
>
> On Mon, Aug 6, 2012 at 2:37 PM, Buckley,Ron <[EMAIL PROTECTED]> wrote:
>> We ran into this situation today on our production instance, I'm
>> wondering if anyone has seen or knows a way around it.
>>
>>
>>
>> We were repeatedly updating the same 60,000 column wide row.  With
>> each update, we were updating nearly every column of the row. Each
>> update generated nearly 9 MB in the WAL.
>>
>> This worked OK most of the time. We saw frequent messages like this in
>
>> the region server log, indicating a 2.6 second flush time:
>>
>> 2012-08-06 11:42:52,316 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush
>> of ~155.4m/162931400, currentsize=2.7k/2760 for region
>> Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in
>> 2610ms, sequenceid=1792263953, compaction requested=false
>>
>>
>>
>> However, sometimes we would see a flush times around 8 - 10 Minutes.
>>
>> 2012-08-06 11:51:28,139 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush
>> of ~155.4m/162905016, currentsize=116.5m/122172880 for region
>> Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in
>> 501337ms, sequenceid=1792264092, compaction requested=true
>>
>>
>>
>> During these long flushes, we would end up taking all the ipc handler
>> slots into that region server that we had configured (50).  It looked
>> to our application like the region server was unavailable.
>>
>>
>>
>> After the long delay, the region server would continue running fine,
>> but then do it again a few minutes later.
>>
>>
>>
>> I thought it was most likely something with Garbage Collection, so I
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB