Did you use TableInputFormat in your MR job ?
Did you use the one from mapred or mapreduce ?
What version of HBase are you using ?
Did you take a look at Ganglia to see if there is any bottleneck in your
You mentioned a few changes upon config file shortly before this problem
appeared, can you let us know which parameters you modified ?
On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <[EMAIL PROTECTED]> wrote:
> I encounter a weird lag behind map task issue here :
> I have a small hadoop/hbase cluster with 1 master node and 4 regionserver
> node all have 16 CPU with map and reduce slot set to 24.
> A few table is created with regions distributed on each region node evenly
> ( say 16 region for each region server). Also each region has almost the
> same number of kvs with very similar size. All table had major_compact done
> to ensure data locality
> I have a MR job which simply do local region scan in every map task ( so
> 16 map task for each regionserver node).
> By theory, every map task should finish within similar time.
> But the real case is that some regions on the same region server always
> lags behind a lot, say cost 150 ~250% of the other map tasks average times.
> If this is happen to a single region server for every table, I might doubt
> it is a disk issue or other reason that bring down the performance of this
> region server.
> But the weird thing is that, though with each single table, almost all the
> map task on the the same single regionserver is lag behind. But for
> different table, this lag behind regionserver is different! And the region
> and region size is distributed evenly which I double checked for a lot of
> times. ( I even try to set replica to 4 to ensure every node have a copy of
> local data)
> Say table 1, all map task on regionserver node 2 is slow. While for table
> 2, maybe all map task on regionserver node 3 is slow, and with table 1, it
> will always be regionserver node 2 which is slow regardless of cluster
> restart, and the slowest map task will always be the very same one. And it
> won't go away even I do major compact again.....
> So, anyone could give me some clue on what reason might possible lead to
> this weird behavior? Any wild guess is welcome!
> (BTW. I don't encounter this issue a few days ago with the same table.
> While I do restart cluster and do a few changes upon config file during
> that period, But restore the config file don't help)
> Best Regards,
> Raymond Liu