Thanks for your reply
> Did you use TableInputFormat in your MR job ?
No, a custom one which do the same split work, but input for each map task is the split, and the map task open htable and read the specific region by itself.
> Did you use the one from mapred or mapreduce ?
All related staff from mapreduce.
> What version of HBase are you using ?
> Did you take a look at Ganglia to see if there is any bottleneck in your cluster ?
I don't, but I do check cpu and disk usage simply by dstat -cdnm , no cpu or disk or network IO bottle neck is observed.
> You mentioned a few changes upon config file shortly before this problem
> appeared, can you let us know which parameters you modified ?
Mainly increase dfs.datanode.handler.count / hbase.regionserver.handler.count from default to around 30 etc. while this is done on every node. And I change it back later. Hmm...
> On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <[EMAIL PROTECTED]> wrote:
> > Hi
> > I encounter a weird lag behind map task issue here :
> > I have a small hadoop/hbase cluster with 1 master node and 4
> > regionserver node all have 16 CPU with map and reduce slot set to 24.
> > A few table is created with regions distributed on each region node
> > evenly ( say 16 region for each region server). Also each region has
> > almost the same number of kvs with very similar size. All table had
> > major_compact done to ensure data locality
> > I have a MR job which simply do local region scan in every map task (
> > so
> > 16 map task for each regionserver node).
> > By theory, every map task should finish within similar time.
> > But the real case is that some regions on the same region server
> > always lags behind a lot, say cost 150 ~250% of the other map tasks average
> > If this is happen to a single region server for every table, I might
> > doubt it is a disk issue or other reason that bring down the
> > performance of this region server.
> > But the weird thing is that, though with each single table, almost all
> > the map task on the the same single regionserver is lag behind. But
> > for different table, this lag behind regionserver is different! And
> > the region and region size is distributed evenly which I double
> > checked for a lot of times. ( I even try to set replica to 4 to ensure
> > every node have a copy of local data)
> > Say table 1, all map task on regionserver node 2 is slow. While for
> > table 2, maybe all map task on regionserver node 3 is slow, and with
> > table 1, it will always be regionserver node 2 which is slow
> > regardless of cluster restart, and the slowest map task will always be
> > the very same one. And it won't go away even I do major compact again.....
> > So, anyone could give me some clue on what reason might possible lead
> > to this weird behavior? Any wild guess is welcome!
> > (BTW. I don't encounter this issue a few days ago with the same table.
> > While I do restart cluster and do a few changes upon config file
> > during that period, But restore the config file don't help)
> > Best Regards,
> > Raymond Liu