Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Sync latency


Copy link to this message
-
Re: Sync latency
Hi Todd,

we don't see any problems in dmesg, neither disk controllers nor any other
problem. We have checked the controller status and it reports no failures
of any kind on the disks.

We really don't have a clue as to what might be happening.

On Mon, Apr 9, 2012 at 7:15 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:

> Hi Placido,
>
> Check dmesg for scsi controller issues on all the nodes? Sometimes
> dead/dying disks, or bad firmware can cause 30+ second pauses
>
> -Todd
>
> On Mon, Apr 9, 2012 at 1:47 AM, Placido Revilla
> <[EMAIL PROTECTED]> wrote:
> > Sorry, that's not the problem. In my logs block reporting never takes
> more
> > than 50 ms to process, even when I'm experiencing sync pauses of 30
> seconds.
> >
> > The dataset is currently small (1.2 TB), as the cluster has been running
> > live for a couple of months only and I have only slightly over 11K blocks
> > in total, that's why block reporting takes little time.
> >
> > On Thu, Apr 5, 2012 at 8:16 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> >
> >> Hi Placido,
> >>
> >> Sounds like it might be related to HDFS-2379. Try updating to Hadoop
> >> 1.0.1 or CDH3u3 and you'll get a fix for that.
> >>
> >> You can verify by grepping for "BlockReport" in your DN logs - if the
> >> pauses on the hbase side correlate with long block reports on the DNs,
> >> the upgrade should fix it.
> >>
> >> -Todd
> >>
> >> On Wed, Apr 4, 2012 at 2:30 AM, Placido Revilla
> >> <[EMAIL PROTECTED]> wrote:
> >> > Hi,
> >> >
> >> > I'm having a problem with sync latency on our HBase cluster. Our
> cluster
> >> is
> >> > composed of 2 NN (and HBase master) machines and 12 DN (and HBase
> >> > regionservers and thrift servers). We are having several issues a day
> >> where
> >> > the cluster seems to halt all processing during several seconds, and
> >> these
> >> > times are aligned with some WARN logs:
> >> >
> >> > 13:23:55,285 WARN  [IPC Server handler 62 on 60020] wal.HLog IPC
> Server
> >> > handler 62 on 60020 took 10713 ms appending an edit to hlog;
> >> > editcount=150694, len~=58.0
> >> > 13:23:55,286 WARN  [IPC Server handler 64 on 60020] wal.HLog IPC
> Server
> >> > handler 64 on 60020 took 10726 ms appending an edit to hlog;
> >> > editcount=319217, len~=47.0
> >> > 13:23:55,286 WARN  [IPC Server handler 118 on 60020] wal.HLog IPC
> Server
> >> > handler 118 on 60020 took 10741 ms appending an edit to hlog;
> >> > editcount=373337, len~=49.0
> >> > 13:23:55,286 WARN  [IPC Server handler 113 on 60020] wal.HLog IPC
> Server
> >> > handler 113 on 60020 took 10746 ms appending an edit to hlog;
> >> > editcount=57912, len~=45.0
> >> > 15:39:38,193 WARN  [IPC Server handler 94 on 60020] wal.HLog IPC
> Server
> >> > handler 94 on 60020 took 21787 ms appending an edit to hlog;
> >> > editcount=2901, len~=45.0
> >> > 15:39:38,194 WARN  [IPC Server handler 82 on 60020] wal.HLog IPC
> Server
> >> > handler 82 on 60020 took 21784 ms appending an edit to hlog;
> >> > editcount=29944, len~=46.0
> >> > 16:09:38,201 WARN  [IPC Server handler 78 on 60020] wal.HLog IPC
> Server
> >> > handler 78 on 60020 took 10321 ms appending an edit to hlog;
> >> > editcount=163998, len~=104.0
> >> > 16:09:38,203 WARN  [IPC Server handler 97 on 60020] wal.HLog IPC
> Server
> >> > handler 97 on 60020 took 10205 ms appending an edit to hlog;
> >> > editcount=149497, len~=60.0
> >> > 16:09:38,203 WARN  [IPC Server handler 68 on 60020] wal.HLog IPC
> Server
> >> > handler 68 on 60020 took 10199 ms appending an edit to hlog;
> >> > editcount=318268, len~=63.0
> >> > 16:09:38,203 WARN  [IPC Server handler 120 on 60020] wal.HLog IPC
> Server
> >> > handler 120 on 60020 took 10211 ms appending an edit to hlog;
> >> > editcount=88001, len~=45.0
> >> > 16:09:38,204 WARN  [IPC Server handler 88 on 60020] wal.HLog IPC
> Server
> >> > handler 88 on 60020 took 10235 ms appending an edit to hlog;
> >> > editcount=141516, len~=100.0
> >> >
> >> > The machines in the cluster are pretty powerful (8 HT cores, 48 GB
> RAM, 6
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB