Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Sync latency


+
Placido Revilla 2012-04-04, 09:30
Copy link to this message
-
Re: Sync latency
Hi Placido,

Sounds like it might be related to HDFS-2379. Try updating to Hadoop
1.0.1 or CDH3u3 and you'll get a fix for that.

You can verify by grepping for "BlockReport" in your DN logs - if the
pauses on the hbase side correlate with long block reports on the DNs,
the upgrade should fix it.

-Todd

On Wed, Apr 4, 2012 at 2:30 AM, Placido Revilla
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm having a problem with sync latency on our HBase cluster. Our cluster is
> composed of 2 NN (and HBase master) machines and 12 DN (and HBase
> regionservers and thrift servers). We are having several issues a day where
> the cluster seems to halt all processing during several seconds, and these
> times are aligned with some WARN logs:
>
> 13:23:55,285 WARN  [IPC Server handler 62 on 60020] wal.HLog IPC Server
> handler 62 on 60020 took 10713 ms appending an edit to hlog;
> editcount=150694, len~=58.0
> 13:23:55,286 WARN  [IPC Server handler 64 on 60020] wal.HLog IPC Server
> handler 64 on 60020 took 10726 ms appending an edit to hlog;
> editcount=319217, len~=47.0
> 13:23:55,286 WARN  [IPC Server handler 118 on 60020] wal.HLog IPC Server
> handler 118 on 60020 took 10741 ms appending an edit to hlog;
> editcount=373337, len~=49.0
> 13:23:55,286 WARN  [IPC Server handler 113 on 60020] wal.HLog IPC Server
> handler 113 on 60020 took 10746 ms appending an edit to hlog;
> editcount=57912, len~=45.0
> 15:39:38,193 WARN  [IPC Server handler 94 on 60020] wal.HLog IPC Server
> handler 94 on 60020 took 21787 ms appending an edit to hlog;
> editcount=2901, len~=45.0
> 15:39:38,194 WARN  [IPC Server handler 82 on 60020] wal.HLog IPC Server
> handler 82 on 60020 took 21784 ms appending an edit to hlog;
> editcount=29944, len~=46.0
> 16:09:38,201 WARN  [IPC Server handler 78 on 60020] wal.HLog IPC Server
> handler 78 on 60020 took 10321 ms appending an edit to hlog;
> editcount=163998, len~=104.0
> 16:09:38,203 WARN  [IPC Server handler 97 on 60020] wal.HLog IPC Server
> handler 97 on 60020 took 10205 ms appending an edit to hlog;
> editcount=149497, len~=60.0
> 16:09:38,203 WARN  [IPC Server handler 68 on 60020] wal.HLog IPC Server
> handler 68 on 60020 took 10199 ms appending an edit to hlog;
> editcount=318268, len~=63.0
> 16:09:38,203 WARN  [IPC Server handler 120 on 60020] wal.HLog IPC Server
> handler 120 on 60020 took 10211 ms appending an edit to hlog;
> editcount=88001, len~=45.0
> 16:09:38,204 WARN  [IPC Server handler 88 on 60020] wal.HLog IPC Server
> handler 88 on 60020 took 10235 ms appending an edit to hlog;
> editcount=141516, len~=100.0
>
> The machines in the cluster are pretty powerful (8 HT cores, 48 GB RAM, 6
> SATA 7200 RPM disks) so we are not peaking on any hardware, for example CPU
> is never over 20% used (avg 5%), network BW is never over 100 Mbps (with 1
> Gbps links) and 10k packets/s in each RS, RAM is 50% free (used for disk
> cache) and random IOPS is well under 120 per sec (we should be able to
> stand over 600). We have also monitored the GC for pauses (we have 16GB of
> heap for the region servers) and we don't see pauses of more than a couple
> tens of milliseconds (and concurrent sweeps of more than 5 or 6 seconds).
>
> The cluster runs Hadoop 1.0.0 and HBase 0.90.5 almost vanilla (just a
> couple of functionality patches) and the input channel is thrift, with
> autocommit disabled and periodic client flushes every 5 seconds. Our tables
> have the deferred log flush feature enabled with a optional interval of
> 1000 ms. We are not doing any MapReduce on this cluster, just HBase
> concurrent writes and reads on three tables, two of them with 64 regions
> and one with 1024 (a little bit high, I know) and an almost perfectly
> random spread of keys throughout all the cluster. With this setup we are
> able to see a peak of 250k requests/s (about 22k request per RS) with an
> average of 100k request/s a day and an average of 76 bytes per request (so,
> about 150 Mbps in throughput), because our payload is pretty small (less

Todd Lipcon
Software Engineer, Cloudera
+
Placido Revilla 2012-04-09, 08:47
+
Todd Lipcon 2012-04-09, 17:15
+
Placido Revilla 2012-04-10, 15:09